[[!comment format=mdwn username="joey" subject="""comment 1""" date="2015-03-04T14:31:21Z" content=""" What I'm seeing is the unicode arrow is replaced with 0092 and the elipsis with &. It's losing the other byte. The problem seems to be in the base64 encoding that's done, when the metadata value contains spaces or a few other problem characters. These same unicode characters roundtrip through without a problem when not embedded in a string with spaces.
*Utility.Base64> let s = "…"
*Utility.Base64> (s, fromB64 $ toB64 s)
("\8230","&")
git-annex also uses base64 for encoding some creds (and for tagged pushes over XMPP, but only the JID is encoded). The real culprit is the use of `w82s`, which doesn't handle multi-byte characters. I can easily fix this by using `encodeW8` instead. Audited git-annex for other problem w82s uses and don't see any, so will only need to fix this once. Added a quickcheck test for fromB64 . toB64 roundtripping. Unfortunately, the entered unicode characters didn't get saved right, so git-annex can do nothing to fix data that was already entered. """]]