aboutsummaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
Diffstat (limited to 'doc')
-rw-r--r--doc/bugs/Unicode_file_names_ignored_on_Windows.mdwn4
-rw-r--r--doc/todo/windows_support.mdwn36
2 files changed, 40 insertions, 0 deletions
diff --git a/doc/bugs/Unicode_file_names_ignored_on_Windows.mdwn b/doc/bugs/Unicode_file_names_ignored_on_Windows.mdwn
index af3877dbe..b58cf2571 100644
--- a/doc/bugs/Unicode_file_names_ignored_on_Windows.mdwn
+++ b/doc/bugs/Unicode_file_names_ignored_on_Windows.mdwn
@@ -35,3 +35,7 @@ According to https://github.com/msysgit/msysgit/wiki/Git-for-Windows-Unicode-Sup
[2014-03-18 14:28:03 Central Europe Standard Time] read: git ["--git-dir=D:\\anntest\\.git","--work-tree=D:\\anntest","-c","core.bare=false","ls-files","--modified","-z","--","h\225\269ky.txt"]
I can provide additional information, just tell me what you need.
+
+> [[fixed|done]], although this is not the end of encoding issues
+> on Windows. Updating [[windows_support]] to discuss some other ones.
+> --[[Joey]]
diff --git a/doc/todo/windows_support.mdwn b/doc/todo/windows_support.mdwn
index 17accd62e..af78d517f 100644
--- a/doc/todo/windows_support.mdwn
+++ b/doc/todo/windows_support.mdwn
@@ -29,6 +29,42 @@ now! --[[Joey]]
* Deleting a git repository from inside the webapp fails "RemoveDirectory
permision denied ... file is being used by another process"
+## potential encoding problems
+
+[[bugs/Unicode_file_names_ignored_on_Windows]] is fixed, but some potential
+problems remain, since the FileSystemEncoding that git-annex relies on
+seems unreliable/broken on Windows.
+
+* When git-annex displays a filename that it's acting on, there
+ can be mojibake on Windows. For example, "háčky.txt" displays
+ the accented characters as instead the pairs of bytes making
+ up the utf-8. Tried doing various things to the stdout handle
+ to avoid this, but only ended up with encoding crashes, or worse
+ mojibake than this.
+
+* `md5FilePath` still uses the filesystem encoding, and so may produce the
+ wrong value on Windows. This would impact keys that contain problem characters
+ (probably coming from the filename extension), and might cause
+ interoperability problems when git-annex generates the hash directories of a
+ remote, for example a rsync remote.
+
+* `encodeW8` is used in Git.UnionMerge, and while I fixed the other calls to
+ encodeW8, which all involved ByteStrings reading from git and so can just
+ treat it as utf-8 on Windows (via `decodeBS`), in the union merge case,
+ the ByteString has no defined encoding. It may have been written on Unix
+ and contain keys with invalid unicode in them. On windows, the union
+ merge code should probably check if it's valid utf-8, and if not,
+ abort the merge.
+
+* If interoperating with a git-annex repository from a unix system, it's
+ possible for a key to contain some invalid utf-8, which means its filename
+ cannot even be represented on Windows, so who knows what will happen in that
+ case -- probably it will fail in some way when adding the object file
+ to the Windows repo.
+
+* If data from the git repo does not have a unicode encoding, it will be
+ mangled in various places on Windows, which can lead to undefined behavior.
+
## minor problems
* rsync special remotes with a rsyncurl of a local directory are known