summaryrefslogtreecommitdiff
path: root/doc/bugs/Unicode_characters_lost__47__converted_in_metadata.mdwn
diff options
context:
space:
mode:
authorGravatar Joey Hess <joeyh@joeyh.name>2015-03-04 11:16:03 -0400
committerGravatar Joey Hess <joeyh@joeyh.name>2015-03-04 12:54:30 -0400
commit05697fe62116181511084a2eba28c5220e8a0363 (patch)
tree6965f56f5648d6dfa6c5e7d6e31e32eb3975b073 /doc/bugs/Unicode_characters_lost__47__converted_in_metadata.mdwn
parent0c3570844cf60428808d01a73c808e4f7232f082 (diff)
metadata: Fix encoding problem that led to mojibake when storing metadata strings that contained both unicode characters and a space (or '!') character.
The fix is to stop using w82s, which does not properly reconstitute unicode strings. Instrad, use utf8 bytestring to get the [Word8] to base64. This passes unicode through perfectly, including any invalid filesystem encoded characters. Note that toB64 / fromB64 are also used for creds and cipher embedding. It would be unfortunate if this change broke those uses. For cipher embedding, note that ciphers can contain arbitrary bytes (should really be using ByteString.Char8 there). Testing indicated it's not safe to use the new fromB64 there; I think that characters were incorrectly combined. For credpair embedding, the username or password could contain unicode. Before, that unicode would fail to round-trip through the b64. So, I guess this is not going to break any embedded creds that worked before. This bug may have affected some creds before, and if so, this change will not fix old ones, but should fix new ones at least.
Diffstat (limited to 'doc/bugs/Unicode_characters_lost__47__converted_in_metadata.mdwn')
-rw-r--r--doc/bugs/Unicode_characters_lost__47__converted_in_metadata.mdwn2
1 files changed, 2 insertions, 0 deletions
diff --git a/doc/bugs/Unicode_characters_lost__47__converted_in_metadata.mdwn b/doc/bugs/Unicode_characters_lost__47__converted_in_metadata.mdwn
index 8d7475163..ebfef3b77 100644
--- a/doc/bugs/Unicode_characters_lost__47__converted_in_metadata.mdwn
+++ b/doc/bugs/Unicode_characters_lost__47__converted_in_metadata.mdwn
@@ -13,3 +13,5 @@ Unicode characters in metadata are pruned/converted/lost:
### What version of git-annex are you using? On what operating system?
5.20141125 Debian
+
+> [[fixed|done]]; test pass. --[[Joey]]