diff options
Diffstat (limited to 'doc/bugs/unhappy_without_UTF8_locale.mdwn')
-rw-r--r-- | doc/bugs/unhappy_without_UTF8_locale.mdwn | 33 |
1 files changed, 33 insertions, 0 deletions
diff --git a/doc/bugs/unhappy_without_UTF8_locale.mdwn b/doc/bugs/unhappy_without_UTF8_locale.mdwn new file mode 100644 index 000000000..6f1df4fab --- /dev/null +++ b/doc/bugs/unhappy_without_UTF8_locale.mdwn @@ -0,0 +1,33 @@ +Try unsetting LANG and passing git-annex unicode filenames. + + joey@gnu:~/tmp/aa>git annex add ./Üa + add add add add git-annex: <stdout>: commitAndReleaseBuffer: invalid + argument (Invalid or incomplete multibyte or wide character) + +The same problem can be seen with a simple haskell program: + + import System.Environment + import Codec.Binary.UTF8.String + main = do + args <- getArgs + putStrLn $ decodeString $ args !! 0 + + joey@gnu:~/src/git-annex>LANG= runghc ~/foo.hs Ü + foo.hs: <stdout>: hPutChar: invalid argument (Invalid or incomplete multibyte or wide character) + +(The call to `decodeString` is necessary to make the input +unicode string be displayed properly in a utf8 locale, but +does not contribute to this problem.) + +I guess that haskell is setting the IO encoding to latin1, which +is [documented](http://haskell.org/ghc/docs/latest/html/libraries/base/System-IO.html#v:latin1) +to error out on characters > 255. + +So this program doesn't have the problem -- but may output garbage +on non-utf-8 capable terminals: + + import System.IO + main = do + hSetEncoding stdout utf8 + args <- getArgs + putStrLn $ decodeString $ args !! 0 |