diff options
author | 2014-03-19 14:49:01 -0400 | |
---|---|---|
committer | 2014-03-19 15:57:56 -0400 | |
commit | 0bc0364c5cd75695bc66181cc3bd52a4d26c4c87 (patch) | |
tree | 56bce6cf12e7cfdc7c591d3b36f56221c5b5a2d1 /doc | |
parent | 3b0e263a342cf8a369fcea6b1e41e0533ba2cc7f (diff) |
Windows: Fix some filename encoding bugs.
http://git-annex.branchable.com/bugs/Unicode_file_names_ignored_on_Windows/
Not a complete fix yet.
Diffstat (limited to 'doc')
-rw-r--r-- | doc/bugs/Unicode_file_names_ignored_on_Windows.mdwn | 4 | ||||
-rw-r--r-- | doc/todo/windows_support.mdwn | 36 |
2 files changed, 40 insertions, 0 deletions
diff --git a/doc/bugs/Unicode_file_names_ignored_on_Windows.mdwn b/doc/bugs/Unicode_file_names_ignored_on_Windows.mdwn index af3877dbe..b58cf2571 100644 --- a/doc/bugs/Unicode_file_names_ignored_on_Windows.mdwn +++ b/doc/bugs/Unicode_file_names_ignored_on_Windows.mdwn @@ -35,3 +35,7 @@ According to https://github.com/msysgit/msysgit/wiki/Git-for-Windows-Unicode-Sup [2014-03-18 14:28:03 Central Europe Standard Time] read: git ["--git-dir=D:\\anntest\\.git","--work-tree=D:\\anntest","-c","core.bare=false","ls-files","--modified","-z","--","h\225\269ky.txt"] I can provide additional information, just tell me what you need. + +> [[fixed|done]], although this is not the end of encoding issues +> on Windows. Updating [[windows_support]] to discuss some other ones. +> --[[Joey]] diff --git a/doc/todo/windows_support.mdwn b/doc/todo/windows_support.mdwn index 17accd62e..af78d517f 100644 --- a/doc/todo/windows_support.mdwn +++ b/doc/todo/windows_support.mdwn @@ -29,6 +29,42 @@ now! --[[Joey]] * Deleting a git repository from inside the webapp fails "RemoveDirectory permision denied ... file is being used by another process" +## potential encoding problems + +[[bugs/Unicode_file_names_ignored_on_Windows]] is fixed, but some potential +problems remain, since the FileSystemEncoding that git-annex relies on +seems unreliable/broken on Windows. + +* When git-annex displays a filename that it's acting on, there + can be mojibake on Windows. For example, "háčky.txt" displays + the accented characters as instead the pairs of bytes making + up the utf-8. Tried doing various things to the stdout handle + to avoid this, but only ended up with encoding crashes, or worse + mojibake than this. + +* `md5FilePath` still uses the filesystem encoding, and so may produce the + wrong value on Windows. This would impact keys that contain problem characters + (probably coming from the filename extension), and might cause + interoperability problems when git-annex generates the hash directories of a + remote, for example a rsync remote. + +* `encodeW8` is used in Git.UnionMerge, and while I fixed the other calls to + encodeW8, which all involved ByteStrings reading from git and so can just + treat it as utf-8 on Windows (via `decodeBS`), in the union merge case, + the ByteString has no defined encoding. It may have been written on Unix + and contain keys with invalid unicode in them. On windows, the union + merge code should probably check if it's valid utf-8, and if not, + abort the merge. + +* If interoperating with a git-annex repository from a unix system, it's + possible for a key to contain some invalid utf-8, which means its filename + cannot even be represented on Windows, so who knows what will happen in that + case -- probably it will fail in some way when adding the object file + to the Windows repo. + +* If data from the git repo does not have a unicode encoding, it will be + mangled in various places on Windows, which can lead to undefined behavior. + ## minor problems * rsync special remotes with a rsyncurl of a local directory are known |