summaryrefslogtreecommitdiff
path: root/doc/bugs/Unicode_file_names_ignored_on_Windows/comment_1_3dfa4559dceec50c08ba180f41b4c220._comment
diff options
context:
space:
mode:
Diffstat (limited to 'doc/bugs/Unicode_file_names_ignored_on_Windows/comment_1_3dfa4559dceec50c08ba180f41b4c220._comment')
-rw-r--r--doc/bugs/Unicode_file_names_ignored_on_Windows/comment_1_3dfa4559dceec50c08ba180f41b4c220._comment14
1 files changed, 14 insertions, 0 deletions
diff --git a/doc/bugs/Unicode_file_names_ignored_on_Windows/comment_1_3dfa4559dceec50c08ba180f41b4c220._comment b/doc/bugs/Unicode_file_names_ignored_on_Windows/comment_1_3dfa4559dceec50c08ba180f41b4c220._comment
new file mode 100644
index 000000000..c1778db78
--- /dev/null
+++ b/doc/bugs/Unicode_file_names_ignored_on_Windows/comment_1_3dfa4559dceec50c08ba180f41b4c220._comment
@@ -0,0 +1,14 @@
+[[!comment format=mdwn
+ username="http://joeyh.name/"
+ ip="209.250.56.154"
+ subject="analysis"
+ date="2014-03-18T17:54:09Z"
+ content="""
+The `git ls-files --others -z output` is fine; the mojibake seems to occur in git-annex's reading of that output, which uses GHC's filesystem encoding. On Linux it reads \"h\225\269ky.txt\" but on Windows, \"h\195\161\196\56461ky.txt\".
+
+So, it's failing to compose the multibyte characters, and it seems to have escaped the last byte (which should be \"\141\" based on the other 3) out into the high code plane used for undecodable bytes.
+
+Note that on Linux with LANG=C, the add works, and it sees \"h\56515\56481\56516\56461ky.txt\" -- in this case, all 4 bytes are represented in the high code plane, and so round-trip through ok despite the locale not supporting the utf8 encoding.
+
+Interestingly, while both `[readFile \"h\225\269ky.txt\", readFile \"h\56515\56481\56516\56461ky.txt\"]` work on Linux, only the former does on Windows.
+"""]]