From 6d47cecc8aaeaf94071b65278dcc4f535834e72a Mon Sep 17 00:00:00 2001 From: "http://joeyh.name/" Date: Tue, 18 Mar 2014 17:54:09 +0000 Subject: Added a comment: analysis --- .../comment_1_3dfa4559dceec50c08ba180f41b4c220._comment | 14 ++++++++++++++ 1 file changed, 14 insertions(+) create mode 100644 doc/bugs/Unicode_file_names_ignored_on_Windows/comment_1_3dfa4559dceec50c08ba180f41b4c220._comment diff --git a/doc/bugs/Unicode_file_names_ignored_on_Windows/comment_1_3dfa4559dceec50c08ba180f41b4c220._comment b/doc/bugs/Unicode_file_names_ignored_on_Windows/comment_1_3dfa4559dceec50c08ba180f41b4c220._comment new file mode 100644 index 000000000..c1778db78 --- /dev/null +++ b/doc/bugs/Unicode_file_names_ignored_on_Windows/comment_1_3dfa4559dceec50c08ba180f41b4c220._comment @@ -0,0 +1,14 @@ +[[!comment format=mdwn + username="http://joeyh.name/" + ip="209.250.56.154" + subject="analysis" + date="2014-03-18T17:54:09Z" + content=""" +The `git ls-files --others -z output` is fine; the mojibake seems to occur in git-annex's reading of that output, which uses GHC's filesystem encoding. On Linux it reads \"h\225\269ky.txt\" but on Windows, \"h\195\161\196\56461ky.txt\". + +So, it's failing to compose the multibyte characters, and it seems to have escaped the last byte (which should be \"\141\" based on the other 3) out into the high code plane used for undecodable bytes. + +Note that on Linux with LANG=C, the add works, and it sees \"h\56515\56481\56516\56461ky.txt\" -- in this case, all 4 bytes are represented in the high code plane, and so round-trip through ok despite the locale not supporting the utf8 encoding. + +Interestingly, while both `[readFile \"h\225\269ky.txt\", readFile \"h\56515\56481\56516\56461ky.txt\"]` work on Linux, only the former does on Windows. +"""]] -- cgit v1.2.3