summaryrefslogtreecommitdiff
path: root/doc/todo/support-non-utf8-locales.mdwn
blob: da40118d52fc50f6eb6cbb71d575a65a9eebbf74 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
Currenty, git-annex forces output, particularly of filenames, in a utf-8
locale.

Note that this does not mean it cannot be used with filenames in other
encodings. git-annex is entirely encoding agnostic when it comes to 
manipulating filenames. It just *displays* their names always converted to
utf-8, which  may not look right when you have a non-utf8 locale.

This had to be done to work around some bugs with haskell's handling
of filename encodings. In particular,

* [[bugs/unhappy_without_UTF8_locale]]: haskell crashes when told to output 
  a string with characters > 255 in a non-utf8 locale.
* [[bugs/problems_with_utf8_names]]: On many OSs, haskell expects
  non-decoded raw char8 in FilePaths. In order to display a filename,
  though, it needs to first be decoded, and git-annex currently assumes
  it was encoded as utf8.

git-annex's behavior is unlikely to improve much until haskell's
support for utf8 filenames improves. --[[Joey]]

> [[done]] -- I just turned off all encoding handling on stdout and stderr,
> which avoids these problems nicely. Git-annex now displays just what it
> input, at least on platforms where haskell does not decode unicode in
> FilePaths. This will later be a problem when it gets localized, but for
> now works great. --[[Joey]]