summaryrefslogtreecommitdiff
path: root/doc/bugs/unhappy_without_UTF8_locale.mdwn
blob: 8d22b9ee4408f0b0938b982b0388fcc1dd539928 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
Try unsetting LANG and passing git-annex unicode filenames.

	joey@gnu:~/tmp/aa>git annex add ./Üa
	add add add add git-annex: <stdout>: commitAndReleaseBuffer: invalid
	argument (Invalid or incomplete multibyte or wide character)

> Interestingly, I can get the same crash in the de_DE.UTF-8 locale
> with certian input filenames, while in en_US.UTF-8, it's ok.
> The workaround below avoided the problem in de_DE.UTF-8. --[[Joey]]

> Put in the utf-8 forcing workaround for now. [[done]] --[[Joey]] 

## underlying haskell problem and workaround

The same problem can be seen with a simple haskell program:

	import System.Environment
	import Codec.Binary.UTF8.String
	main = do
	        args <- getArgs
	        putStrLn $ decodeString $ args !! 0

	joey@gnu:~/src/git-annex>LANG= runghc ~/foo.hs Ü
	foo.hs: <stdout>: hPutChar: invalid argument (Invalid or incomplete multibyte or wide character)

(The call to `decodeString` is necessary to make the input
unicode string be displayed properly in a utf8 locale, but
does not contribute to this problem.)

I guess that haskell is setting the IO encoding to latin1, which
is [documented](http://haskell.org/ghc/docs/latest/html/libraries/base/System-IO.html#v:latin1)
to error out on characters > 255. 

So this program doesn't have the problem -- but may output garbage
on non-utf-8 capable terminals:

	import System.IO
	main = do
 		hSetEncoding stdout utf8
	        args <- getArgs
	        putStrLn $ decodeString $ args !! 0