summaryrefslogtreecommitdiff
path: root/doc/bugs/problems_with_utf8_names.mdwn
diff options
context:
space:
mode:
authorGravatar Joey Hess <joey@kitenet.net>2012-02-03 16:25:34 -0400
committerGravatar Joey Hess <joey@kitenet.net>2012-02-03 16:25:34 -0400
commit05f89123e08075cfbd136f37c60423c1ad38d1fe (patch)
treee83d9303131a31d8e7a4b9784965f4aa6184ac11 /doc/bugs/problems_with_utf8_names.mdwn
parent94caa268831e14134ee72a01eace8c9fff9a954a (diff)
update; ghc7.4 branch fixes this pretty well now
Diffstat (limited to 'doc/bugs/problems_with_utf8_names.mdwn')
-rw-r--r--doc/bugs/problems_with_utf8_names.mdwn56
1 files changed, 4 insertions, 52 deletions
diff --git a/doc/bugs/problems_with_utf8_names.mdwn b/doc/bugs/problems_with_utf8_names.mdwn
index b99b58783..fbdca41cd 100644
--- a/doc/bugs/problems_with_utf8_names.mdwn
+++ b/doc/bugs/problems_with_utf8_names.mdwn
@@ -3,58 +3,10 @@ This bug is reopened to track some new UTF-8 filename issues caused by GHC
encoding no longer works. Even unicode filenames fail to work when
git-annex is built with 7.4. --[[Joey]]
-**As a stopgap workaround**, I have made a branch `unicode-only`. This
-makes git-annex work with unicode filenames with ghc 7.4, but *only*
-unicode filenames. If you have filenames with some other encoding, you're
-out in the cold, and it will probably just crash with a error about wrong
-encoding.
-
-## analysis
-
-What's going on exactly? The new ghc, when presented with
-a String of raw bytes like "fo\194\161", and asked to do
-something like `getSymbolicLinkStatus`, encodes it
-as unicode, yielding "fo\303\202\302\241". Which is
-not the same as the original filename, assuming it was "fo¡".
-
-The new ghc requires a new data type, `RawFilePath` be used if you
-don't want to impose utf-8 filenames on your users.
-
-The available `RawFilePath` support is quite low-level, so all the nice
-readFile and writeFile code, etc has to be reimplemented. So do any utility
-libraries that do things with FilePaths, if you need them to use
-RawFilePaths. Until the haskell ecosystem adapts to `RawFilePath`
-(if it does), using it broadly, as git-annex needs to, will be difficult.
-
-## rawfilepath branch
-
-I have a `rawfilepath` branch in git where I am trying to convert it to use
-`RawFilePath`. However, since there is no way to cast a `FilePath` to a
-`RawFilePath` or back (because the encoding of `RawFilePath` is not
-specified), this means changing essentially all of git-annex. Even the
-filenames used for keys in `.git/annex/objects` need to use the new data
-type. I didn't get very far on this branch.
-
-## newghc-edges branch
-
-I have a `newghc-edges` branch in git, trying a different approach.
-
-A `RawFilePath` contains only bytes, so it can actually be cast to a string,
-containing encoded characters. That string can then be 1) output in binary
-mode or 2) manipulated in ways that do not add characters larger than 255,
-and cast back to a `RawFilePath`. While not type-safe, such casts should at
-least help during bootstrapping, and might allow for a quick fix that only
-changes to `RawFilePath` at the edges.
-
-The branch contains an almost complete, although probably also buggy
-conversion using this method. It is missing wrappers for a
-few things like `readFile` and `writeFile` but otherwise seems to
-basically work.
-
-Is this a suitable approach for merging into `master`? It's nasty,
-being not type safe, having to reimplent/copy+modify random bits of
-libraries, etc. The nastiness is contained, though, in a single file,
-of only a few hundred lines of code. --[[Joey]]
+I now have a `ghc7.4` branch in git that seems to solve this,
+for all filename encodings, and all system encodings. It will
+only build with the new GHC. If you have this problem, give it a try!
+--[[Joey]]
----