diff options
author | Kurtis Rader <krader@skepticism.us> | 2016-03-10 18:17:39 -0800 |
---|---|---|
committer | Kurtis Rader <krader@skepticism.us> | 2016-03-20 18:47:38 -0700 |
commit | c2f1df1d4af0c7e633528cb4c8caa79ef04b0b5a (patch) | |
tree | 0776e975779488cb842c09a5d79d193cb7cf9fdc /tests | |
parent | fb0921249f4584e68699e336be249a655b9c8ede (diff) |
fix handling of non-ASCII chars in C locale
The relevant standards allow the mbtowc/mbrtowc functions to reject
non-ASCII characters (i.e., chars with the high bit set) when the locale
is C or POSIX. The BSD libraries (e.g., on OS X) don't do this but
the GNU libraries (e.g., on Linux) do. Like most programs we need the
C/POSIX locales to allow arbitrary bytes. So explicitly check if we're
in a single-byte locale (which would also include ISO-8859 variants)
and simply pass-thru the chars without encoding or decoding.
Fixes #2802.
Diffstat (limited to 'tests')
-rw-r--r-- | tests/c-locale.err | 0 | ||||
-rw-r--r-- | tests/c-locale.in | 35 | ||||
-rw-r--r-- | tests/c-locale.out | 4 | ||||
-rw-r--r-- | tests/c-locale.status | 1 |
4 files changed, 40 insertions, 0 deletions
diff --git a/tests/c-locale.err b/tests/c-locale.err new file mode 100644 index 00000000..e69de29b --- /dev/null +++ b/tests/c-locale.err diff --git a/tests/c-locale.in b/tests/c-locale.in new file mode 100644 index 00000000..d2f2bd5f --- /dev/null +++ b/tests/c-locale.in @@ -0,0 +1,35 @@ +# Verify that fish can pass through non-ASCII characters in the C/POSIX +# locale. This is to prevent regression of +# https://github.com/fish-shell/fish-shell/issues/2802. +# +# These tests are needed because the relevant standards allow the functions +# mbrtowc() and wcrtomb() to treat bytes with the high bit set as either valid +# or invalid in the C/POSIX locales. GNU libc treats those bytes as invalid. +# Other libc implementations (e.g., BSD) treat them as valid. We want fish to +# always treat those bytes as valid. + +# The fish in the middle of the pipeline should be receiving a UTF-8 encoded +# version of the unicode from the echo. It should pass those bytes thru +# literally since it is in the C locale. We verify this by first passing the +# echo output directly to the `xxd` program then via a fish instance. The +# output should be "58c3bb58" for the first statement and "58c3bc58" for the +# second. +echo -n X\u00fbX | \ + xxd --plain +echo X\u00fcX | env LC_ALL=C ../test/root/bin/fish -c 'read foo; echo -n $foo' | \ + xxd --plain + +# This test is subtle. Despite the presence of the \u00fc unicode char (a "u" +# with an umlaut) the fact the locale is C/POSIX will cause the \xfc byte to +# be emitted rather than the usual UTF-8 sequence \xc3\xbc. That's because the +# few single-byte unicode chars (that are not ASCII) are generally in the +# ISO-8859-1 char set which is encompased by the C locale. The output should +# be "59fc59". +env LC_ALL=C ../test/root/bin/fish -c 'echo -n Y\u00fcY' | \ + xxd --plain + +# The user can specify a wide unicode character (one requiring more than a +# single byte). In the C/POSIX locales we substitute a question-mark for the +# unencodable wide char. The output should be "543f54". +env LC_ALL=C ../test/root/bin/fish -c 'echo -n T\u01fdT' | \ + xxd --plain diff --git a/tests/c-locale.out b/tests/c-locale.out new file mode 100644 index 00000000..10a94d3e --- /dev/null +++ b/tests/c-locale.out @@ -0,0 +1,4 @@ +58c3bb58 +58c3bc58 +59fc59 +543f54 diff --git a/tests/c-locale.status b/tests/c-locale.status new file mode 100644 index 00000000..573541ac --- /dev/null +++ b/tests/c-locale.status @@ -0,0 +1 @@ +0 |