summaryrefslogtreecommitdiff
path: root/doc/bugs/unhappy_without_UTF8_locale.mdwn
diff options
context:
space:
mode:
authorGravatar Joey Hess <joey@kitenet.net>2011-02-10 14:21:44 -0400
committerGravatar Joey Hess <joey@kitenet.net>2011-02-10 14:21:44 -0400
commitfe55b4644e67bba60b35e07abcdd312b65c9d6f3 (patch)
tree4631f428f86f72d614f9b5388772b6ec58a3fb8d /doc/bugs/unhappy_without_UTF8_locale.mdwn
parente7a3475704f5366e89aebe78cefbeb58ff5ab181 (diff)
Fix display of unicode filenames.
Internally, the filenames are stored as un-decoded unicode. I tried decoding them, but then haskell tries to access the wrong files. Hmm. So, I've unhappily chosen option "B", which is to decode filenames before they are displayed.
Diffstat (limited to 'doc/bugs/unhappy_without_UTF8_locale.mdwn')
-rw-r--r--doc/bugs/unhappy_without_UTF8_locale.mdwn33
1 files changed, 33 insertions, 0 deletions
diff --git a/doc/bugs/unhappy_without_UTF8_locale.mdwn b/doc/bugs/unhappy_without_UTF8_locale.mdwn
new file mode 100644
index 000000000..6f1df4fab
--- /dev/null
+++ b/doc/bugs/unhappy_without_UTF8_locale.mdwn
@@ -0,0 +1,33 @@
+Try unsetting LANG and passing git-annex unicode filenames.
+
+ joey@gnu:~/tmp/aa>git annex add ./Üa
+ add add add add git-annex: <stdout>: commitAndReleaseBuffer: invalid
+ argument (Invalid or incomplete multibyte or wide character)
+
+The same problem can be seen with a simple haskell program:
+
+ import System.Environment
+ import Codec.Binary.UTF8.String
+ main = do
+ args <- getArgs
+ putStrLn $ decodeString $ args !! 0
+
+ joey@gnu:~/src/git-annex>LANG= runghc ~/foo.hs Ü
+ foo.hs: <stdout>: hPutChar: invalid argument (Invalid or incomplete multibyte or wide character)
+
+(The call to `decodeString` is necessary to make the input
+unicode string be displayed properly in a utf8 locale, but
+does not contribute to this problem.)
+
+I guess that haskell is setting the IO encoding to latin1, which
+is [documented](http://haskell.org/ghc/docs/latest/html/libraries/base/System-IO.html#v:latin1)
+to error out on characters > 255.
+
+So this program doesn't have the problem -- but may output garbage
+on non-utf-8 capable terminals:
+
+ import System.IO
+ main = do
+ hSetEncoding stdout utf8
+ args <- getArgs
+ putStrLn $ decodeString $ args !! 0