summaryrefslogtreecommitdiff
path: root/doc/bugs/unhappy_without_UTF8_locale.mdwn
diff options
context:
space:
mode:
Diffstat (limited to 'doc/bugs/unhappy_without_UTF8_locale.mdwn')
-rw-r--r--doc/bugs/unhappy_without_UTF8_locale.mdwn33
1 files changed, 33 insertions, 0 deletions
diff --git a/doc/bugs/unhappy_without_UTF8_locale.mdwn b/doc/bugs/unhappy_without_UTF8_locale.mdwn
new file mode 100644
index 000000000..6f1df4fab
--- /dev/null
+++ b/doc/bugs/unhappy_without_UTF8_locale.mdwn
@@ -0,0 +1,33 @@
+Try unsetting LANG and passing git-annex unicode filenames.
+
+ joey@gnu:~/tmp/aa>git annex add ./Üa
+ add add add add git-annex: <stdout>: commitAndReleaseBuffer: invalid
+ argument (Invalid or incomplete multibyte or wide character)
+
+The same problem can be seen with a simple haskell program:
+
+ import System.Environment
+ import Codec.Binary.UTF8.String
+ main = do
+ args <- getArgs
+ putStrLn $ decodeString $ args !! 0
+
+ joey@gnu:~/src/git-annex>LANG= runghc ~/foo.hs Ü
+ foo.hs: <stdout>: hPutChar: invalid argument (Invalid or incomplete multibyte or wide character)
+
+(The call to `decodeString` is necessary to make the input
+unicode string be displayed properly in a utf8 locale, but
+does not contribute to this problem.)
+
+I guess that haskell is setting the IO encoding to latin1, which
+is [documented](http://haskell.org/ghc/docs/latest/html/libraries/base/System-IO.html#v:latin1)
+to error out on characters > 255.
+
+So this program doesn't have the problem -- but may output garbage
+on non-utf-8 capable terminals:
+
+ import System.IO
+ main = do
+ hSetEncoding stdout utf8
+ args <- getArgs
+ putStrLn $ decodeString $ args !! 0