test_utf8_mode.test_cmd_line() fails on HP-UX due to false assumptions #78584

michael-o · 2018-08-14T12:57:13Z

BPO	34403
Nosy	@terryjreedy, @vstinner, @aixtools, @michael-o
PRs	bpo-34403: Skip test_utf8_mode.test_cmd_line() on HP-UX #8966 bpo-34403: Fix test_utf8_mode.test_cmd_line() on HP-UX #8967 bpo-34403: On HP-UX, force ASCII for C locale #8969 bpo-34523: Fix config_init_fs_encoding() for ASCII #10232 [3.7] bpo-34403: Fix initfsencoding() for ASCII #10233 [3.7] bpo-34403: Always implement _Py_GetForceASCII() #10235
Dependencies	bpo-34207: test_cmd_line test_utf8_mode test_warnings fail in all FreeBSD 3.x (3.8) buildbots
Files	c_locale.c py3.8-LC-C.text py3.8-default.text

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2018-08-29.13:49:04.403>
created_at = <Date 2018-08-14.12:57:12.659>
labels = ['3.7', 'tests', '3.8', 'type-bug', 'library']
title = 'test_utf8_mode.test_cmd_line() fails on HP-UX due to false assumptions'
updated_at = <Date 2018-10-30.15:43:10.318>
user = 'https://github.com/michael-o'

bugs.python.org fields:

activity = <Date 2018-10-30.15:43:10.318>
actor = 'vstinner'
assignee = 'none'
closed = True
closed_date = <Date 2018-08-29.13:49:04.403>
closer = 'vstinner'
components = ['Library (Lib)', 'Tests']
creation = <Date 2018-08-14.12:57:12.659>
creator = 'michael-o'
dependencies = ['34207']
files = ['47767', '47770', '47771']
hgrepos = []
issue_num = 34403
keywords = ['patch', '3.7regression']
message_count = 39.0
messages = ['323516', '323682', '323729', '323797', '324070', '324085', '324100', '324172', '324175', '324176', '324196', '324211', '324213', '324214', '324215', '324216', '324217', '324219', '324221', '324225', '324236', '324238', '324242', '324253', '324260', '324261', '324273', '324274', '324289', '324322', '324323', '324463', '324502', '328901', '328902', '328903', '328916', '328924', '328932']
nosy_count = 4.0
nosy_names = ['terry.reedy', 'vstinner', 'Michael.Felt', 'michael-o']
pr_nums = ['8966', '8967', '8969', '10232', '10233', '10235']
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue34403'
versions = ['Python 3.7', 'Python 3.8']

michael-o · 2018-08-14T12:57:13Z

Running from 3.7 branch on HP-UX 11.31 ia64, 32 bit, big endian.
The test output is:

Re-running failed tests in verbose mode
Re-running test 'test_utf8_mode' in verbose mode
test_cmd_line (test.test_utf8_mode.UTF8ModeTests) ... FAIL
test_env_var (test.test_utf8_mode.UTF8ModeTests) ... ok
test_filesystemencoding (test.test_utf8_mode.UTF8ModeTests) ... ok
test_io (test.test_utf8_mode.UTF8ModeTests) ... ok
test_io_encoding (test.test_utf8_mode.UTF8ModeTests) ... ok
test_locale_getpreferredencoding (test.test_utf8_mode.UTF8ModeTests) ... ok
test_optim_level (test.test_utf8_mode.UTF8ModeTests) ... ok
test_posix_locale (test.test_utf8_mode.UTF8ModeTests) ... ok
test_stdio (test.test_utf8_mode.UTF8ModeTests) ... ok
test_xoption (test.test_utf8_mode.UTF8ModeTests) ... ok

======================================================================
FAIL: test_cmd_line (test.test_utf8_mode.UTF8ModeTests)
----------------------------------------------------------------------

> Traceback (most recent call last):
>   File "/var/osipovmi/cpython/Lib/test/test_utf8_mode.py", line 230, in test_cmd_line
>     check('utf8=0', [c_arg], LC_ALL='C')
>   File "/var/osipovmi/cpython/Lib/test/test_utf8_mode.py", line 223, in check
>     self.assertEqual(args, ascii(expected), out)
> AssertionError: "['h\\xc3\\xa9\\xe2\\x82\\xac']" != "['h\\udcc3\\udca9\\udce2\\udc82\\udcac']"
> - ['h\xc3\xa9\xe2\x82\xac']
> + ['h\udcc3\udca9\udce2\udc82\udcac']
>  : roman8:['h\xc3\xa9\xe2\x82\xac']
> 
>

Ran 10 tests in 2.595s

FAILED (failures=1)
test test_utf8_mode failed
1 test failed again:
test_utf8_mode

== Tests result: FAILURE then FAILURE ==

1 test failed:
test_utf8_mode

1 re-run test:
test_utf8_mode

Total duration: 7 sec 265 ms
Tests result: FAILURE then FAILURE
Makefile:1066: recipe for target 'test' failed
gmake: *** [test] Error 2

I tried to understand the issue, but my Python knowledge is too low, especially I do not understand by a byte array "arg = 'h\xe9\u20ac'.encode('utf-8')" is passed as one arg to the forked process.

I highly assume that this is related to the non-standard, default character encoding on HP-UX: https://en.wikipedia.org/wiki/HP_Roman#HP_Roman-8 (roman8).

A stupid 8 bit encoding. The very same snippet on FreeBSD says:

$ LC_ALL=C python3.6 test_utf8.py
US-ASCII:[]

Willing to test and modify if someone tells what to do.

terryjreedy · 2018-08-17T22:16:58Z

You might get more information asking questions on python-list.

michael-o · 2018-08-18T18:58:14Z

Thanks, I'll do that. Hopefully I can provide a patch for. Though, I am convinced that I have to write a custom codec for roman8 to make all at stuff work flawlessly.

aixtools · 2018-08-20T16:11:07Z

Although the default is different (i.e., roman8 versus latin1 (iso8859-1)) both HP-UX and AIX (like Windows, cp1252) this issue and bpo-33347 are related.

As I mentioned in https://bugs.python.org/issue34347#msg323319 the string seen by self.get_output() is not the same string as "expected".

If I recall, there may be a way to almost get the two be the same - excect "expected" is a bytes object and the value returned as CLI output is a regular string.

I am thinking, maybe the "easy" way will be to add AIX, HP-UX, and others to skip this test. Rather than hard-code, do a query to see what the default is, and it it is not UTF-8 - skip the test.

In any case, it seems to be broken for any system that does not have UTF-8 as default.

aixtools · 2018-08-25T14:33:52Z

It might be as simple as what I saw for AIX:

diff --git a/Lib/test/test_utf8_mode.py b/Lib/test/test_utf8_mode.py
index 26e2e13ec5..3e918fd54c 100644
--- a/Lib/test/test_utf8_mode.py
+++ b/Lib/test/test_utf8_mode.py
@@ -219,6 +219,8 @@ class UTF8ModeTests(unittest.TestCase):
         check('utf8', [arg_utf8])
         if sys.platform == 'darwin' or support.is_android:
             c_arg = arg_utf8
+        elif sys.platform.startswith("aix"):
+            c_arg = arg.decode('iso-8859-1')
         else:
             c_arg = arg_ascii
         check('utf8=0', [c_arg], LC_ALL='C')

so, adding below might be all that is needed:
+ elif sys.platform == "hpux":
+ c_arg = arg.decode('roman8')

aixtools · 2018-08-25T17:42:38Z

As the AIX complaint is (was once the PR merges):
AssertionError: "['h\\xc3\\xa9\\xe2\\x82\\xac']" != "['h\\udcc3\\udca9\\udce2\\udc82\\udcac']"

['h\xc3\xa9\xe2\x82\xac']
+ ['h\udcc3\udca9\udce2\udc82\udcac']

And the HP-UX complaint is:
  File "/var/osipovmi/cpython/Lib/test/test_utf8_mode.py", line 223, in check
    self.assertEqual(args, ascii(expected), out)
AssertionError: "['h\\xc3\\xa9\\xe2\\x82\\xac']" != "['h\\udcc3\\udca9\\udce2\\udc82\\udcac']"
- ['h\xc3\xa9\xe2\x82\xac']
+ ['h\udcc3\udca9\udce2\udc82\udcac']

Maybe a change such as:

--- a/Lib/test/test_utf8_mode.py
+++ b/Lib/test/test_utf8_mode.py
@@ -219,6 +219,8 @@ class UTF8ModeTests(unittest.TestCase):
         check('utf8', [arg_utf8])
         if sys.platform == 'darwin' or support.is_android:
             c_arg = arg_utf8
+        elif (platform.system == "AIX") or
+              sys.platform.startswith("hp-ux"):
+            c_arg = arg.decode('iso-8859-1')
         else:
             c_arg = arg_ascii
         check('utf8=0', [c_arg], LC_ALL='C')

I mention this because it seems neither roman8 nor roman9 have 'official' iso names or alias (correct me if I am wrong).

michael-o · 2018-08-25T20:17:09Z

I think you are absoltely right.

In any case, it seems to be broken for any system that does not have UTF-8 as default.

You likely mean ASCII. Python assumes that LANG=C is ASCII which is not the case for AIX and HP-UX.

Your patch looks reasonable, I will try this on Monday. The problem is that there is no roman8 codec in Python. Maybe ISO-8859-1 will do it for the test, but I am still eager to add one.

I mention this because it seems neither roman8 nor roman9 have 'official' iso names or alias (correct me if I am wrong).

There are no ISO names because this is not an ISO encoding. This is an HP invention aka hp-roman8 (roman8, ibm-1051, r8, Cp1051).

Edit: there is roman8 support: https://github.com/python/cpython/blob/e42b705188271da108de42b55d9344642170aa2b/Lib/encodings/hp_roman8.py as well as aliases.

There are a few aliases missing: cp1051, ibm1051 and hp-roman8. This needs an additonal PR.

michael-o · 2018-08-27T13:22:08Z

So I changed the test code to:

diff --git a/Lib/test/test_utf8_mode.py b/Lib/test/test_utf8_mode.py
index 26e2e13ec5..d9f8a3ed8b 100644
--- a/Lib/test/test_utf8_mode.py
+++ b/Lib/test/test_utf8_mode.py
@@ -208,7 +208,7 @@ class UTF8ModeTests(unittest.TestCase):
     def test_cmd_line(self):
         arg = 'h\xe9\u20ac'.encode('utf-8')
         arg_utf8 = arg.decode('utf-8')
-        arg_ascii = arg.decode('ascii', 'surrogateescape')
+        arg_ascii = arg.decode('roman8', 'surrogateescape')
         code = 'import locale, sys; print("%s:%s" % (locale.getpreferredencoding(), ascii(sys.argv[1:])))'

         def check(utf8_opt, expected, **kw):

and the output is:
======================================================================
FAIL: test_cmd_line (test.test_utf8_mode.UTF8ModeTests)
----------------------------------------------------------------------

Traceback (most recent call last):
  File "/var/osipovmi/cpython/Lib/test/test_utf8_mode.py", line 224, in test_cmd_line
    check('utf8=0', [c_arg], LC_ALL='C')
  File "/var/osipovmi/cpython/Lib/test/test_utf8_mode.py", line 217, in check
    self.assertEqual(args, ascii(expected), out)
AssertionError: "['h\\xc3\\xa9\\xe2\\x82\\xac']" != "['h\\xfb\\u02cb\\xe3\\x82\\u02dc']"
- ['h\xc3\xa9\xe2\x82\xac']
+ ['h\xfb\u02cb\xe3\x82\u02dc']
 : roman8:['h\xc3\xa9\xe2\x82\xac']

I still don't understand that.

I believe that surrogate escape only works for ASCII and nothing else. If so, this test must be skipped on HP-UX and AIX.

michael-o · 2018-08-27T13:34:01Z

Maybe skipping the test is the best thing:
diff --git a/Lib/test/test_utf8_mode.py b/Lib/test/test_utf8_mode.py
index 26e2e13ec5..d6c4b321be 100644
--- a/Lib/test/test_utf8_mode.py
+++ b/Lib/test/test_utf8_mode.py
@@ -12,7 +12,7 @@ from test.support.script_helper import assert_python_ok, assert_python_failure

 MS_WINDOWS = (sys.platform == 'win32')
-
+HPUX = (sys.platform.startswith('hp-ux'))

 class UTF8ModeTests(unittest.TestCase):
     DEFAULT_ENV = {
@@ -205,6 +205,7 @@ class UTF8ModeTests(unittest.TestCase):
         self.assertEqual(out, 'UTF-8 UTF-8')

 @unittest.skipIf(MS_WINDOWS, 'test specific to Unix')

+ @unittest.skipIf(HPUX, 'test specific to Unix with ASCII default locale')
def test_cmd_line(self):
arg = 'h\xe9\u20ac'.encode('utf-8')
arg_utf8 = arg.decode('utf-8')

michael-o · 2018-08-27T13:35:09Z

Maybe Victor Stinner has some insights here.

aixtools · 2018-08-27T20:58:46Z

On 27/08/2018 15:22, Michael Osipov wrote:

Michael Osipov <1983-01-06@gmx.net> added the comment:

So I changed the test code to:

diff --git a/Lib/test/test_utf8_mode.py b/Lib/test/test_utf8_mode.py
index 26e2e13ec5..d9f8a3ed8b 100644
--- a/Lib/test/test_utf8_mode.py
+++ b/Lib/test/test_utf8_mode.py
@@ -208,7 +208,7 @@ class UTF8ModeTests(unittest.TestCase):
def test_cmd_line(self):
arg = 'h\xe9\u20ac'.encode('utf-8')
arg_utf8 = arg.decode('utf-8')
   arg_ascii = arg.decode('ascii', 'surrogateescape')
   arg_ascii = arg.decode('roman8', 'surrogateescape')
   code = 'import locale, sys; print("%s:%s" % (locale.getpreferredencoding(), ascii(sys.argv[1:])))'

   def check(utf8_opt, expected, \*\*kw):
and the output is:
======================================================================
FAIL: test_cmd_line (test.test_utf8_mode.UTF8ModeTests)
----------------------------------------------------------------------

> Traceback (most recent call last):
>   File "/var/osipovmi/cpython/Lib/test/test_utf8_mode.py", line 224, in test_cmd_line
>     check('utf8=0', [c_arg], LC_ALL='C')
>   File "/var/osipovmi/cpython/Lib/test/test_utf8_mode.py", line 217, in check
>     self.assertEqual(args, ascii(expected), out)
> AssertionError: "['h\\xc3\\xa9\\xe2\\x82\\xac']" != "['h\\xfb\\u02cb\\xe3\\x82\\u02dc']"
> - ['h\xc3\xa9\xe2\x82\xac']
> + ['h\xfb\u02cb\xe3\x82\u02dc']
>  : roman8:['h\xc3\xa9\xe2\x82\xac']
>
> I still don't understand that.
Something I found helpful was to change:

check('utf8=0', [c_arg], LC_ALL='C')

to
check('utf8=0', [c_arg], LC_ALL='C', failure=True )

This also fails, but it shows what is being executed.

Further, my 'understanding' is that ascii(whatever) is much smarter than
whatever.decode('ascii', ...) does. Also, ascii() tends to use the \x
shorthand, while decode('ascii', 'surrogateescape') uses the \udc prefix.

And, while you might still consider it a 'bug', did you try using c_arg
= arg.decode('iso-88859-1') ?

Michael (F)

I believe that surrogate escape only works for ASCII and nothing else. If so, this test must be skipped on HP-UX and AIX.

----------

Python tracker <report@bugs.python.org>
<https://bugs.python.org/issue34403\>

michael-o · 2018-08-28T06:52:56Z

Wow, this is pretty surprising. The very same patch for AIX works on HP-UX flawlessly:

$ ./python -m test test_utf8_mode
Run tests sequentially
0:00:00 [1/1] test_utf8_mode

== Tests result: SUCCESS ==

1 test OK.

Total duration: 2 sec 769 ms
Tests result: SUCCESS

I still don't really understand why because decode() and ascii() are comparing apples and oranges to me.

Michael, since you provided a decent solution would you mind to extend your patch for HP-UX? You deserve the credits.

michael-o · 2018-08-28T07:18:21Z

Now I know why this cannot with Roman 8: it contains chars which are multibyte in Unicode (UTF-8) which cannot be mapped into a 7-bit/8-bit encoding. Therefore CP1252 does not work because it has Unicode chars too. ISO-8859-1 solely consists of single byte chars.

This test needs to be skipped on HP-UX. I will provide a patch for that.

vstinner · 2018-08-28T07:40:22Z

Hi, I'm the author of the UTF-8 Mode PEP (PEP-540) and its implementation. I wrote test_utf8_mode. I wasn't sure that it was a good idea to hardcode the locale encoding depending on the platform. The fact that AIX and HP-UX use different locale encoding confirms that it was a bad choice. My PR 8967 gets the locale encoding at runtime instead of hardcoding it. It should fix the test on AIX and HP-UX.

To fix the test on HP-UX, I also removed the euro sign (U+20AC: €) from the test string. There is no need to test large code point: a single non-ASCII character is enough to validate the code.

Michael Osipov: would you mind to test my PR on HP-UX please?

michael-o · 2018-08-28T07:42:33Z

Victor, looking to...

michael-o · 2018-08-28T07:49:03Z

It unfortunately does not:
> osipovmi@blnn724x:/var/osipovmi/cpython []
> $ git branch
>   3.6
>   3.7
>   bpo-14568
>   bpo-34401
>   bpo-34403
>   bpo-34412
>   bpo-34448
>   bpo-34449
>   bpo-34519
>   master
>   test_c_locale_coercion_hpux
> * utf8_cmd_line
> $ ./python -m test test_utf8_mode
> Run tests sequentially
> 0:00:00 [1/1] test_utf8_mode
> test test_utf8_mode failed -- Traceback (most recent call last):
>   File "/var/osipovmi/cpython/Lib/test/test_utf8_mode.py", line 231, in test_cmd_line
>     check('utf8=0', [c_arg], LC_ALL='C')
>   File "/var/osipovmi/cpython/Lib/test/test_utf8_mode.py", line 218, in check
>     self.assertEqual(args, ascii(expected), out)
> AssertionError: "['h\\xc2\\xa7\\xc3\\xa9']" != "['h\\xf4\\xcf\\xfb\\u02cb']"
> - ['h\xc2\xa7\xc3\xa9']
> + ['h\xf4\xcf\xfb\u02cb']
>  : roman8:['h\xc2\xa7\xc3\xa9']
> 
> test_utf8_mode failed
> 
> == Tests result: FAILURE ==
> 
> 1 test failed:
>     test_utf8_mode
> 
> Total duration: 2 sec 921 ms
> Tests result: FAILURE

michael-o · 2018-08-28T07:53:58Z

Running off: 217af1d

> $ ./python -m test test_utf8_mode
> Run tests sequentially
> 0:00:00 [1/1] test_utf8_mode
> test test_utf8_mode failed -- Traceback (most recent call last):
>   File "/var/osipovmi/cpython/Lib/test/test_utf8_mode.py", line 235, in test_cmd_line
>     LC_ALL='C')
>   File "/var/osipovmi/cpython/Lib/test/test_utf8_mode.py", line 214, in check
>     self.assertEqual(args, ascii(expected), out)
> AssertionError: "['h\\xa7\\xe9']" != "['h\\xcf\\xd5']"
> - ['h\xa7\xe9']
> + ['h\xcf\xd5']
>  : roman8:['h\xa7\xe9']
> 
> test_utf8_mode failed
> 
> == Tests result: FAILURE ==
> 
> 1 test failed:
>     test_utf8_mode
> 
> Total duration: 2 sec 997 ms
> Tests result: FAILURE

vstinner · 2018-08-28T08:11:03Z

File "/var/osipovmi/cpython/Lib/test/test_utf8_mode.py", line 214, in check
self.assertEqual(args, ascii(expected), out)
AssertionError: "['h\\xa7\\xe9']" != "['h\\xcf\\xd5']"

['h\xa7\xe9']

['h\xcf\xd5']
: roman8:['h\xa7\xe9']

Hum, it looks like a bug in the C library of HP-UX. It announces that the locale encoding is "roman8", but the mbstowcs() function decodes from the Latin1 encoding. The updated test uses the byte string: b'h\xa7\xe9'. The OS announces the encoding roman8, so the test expects the Unicode string: b'h\xa7\xe9'.decode('roman8') == 'h\xcf\xd5'.... but it gets 'h\xa7\xe9' which looks more like the byte string has been decoded from Latin1: b'h\xa7\xe9'.decode('latin1') == 'h\xa7\xe9'.

Michael: would you mind to compile and run the attached c_locale.c test program? It sets the LC_ALL locale to C, dump locales (LC_ALL, LC_CTYPE, nl_langinfo(CODESET)), and then decode all bytes from the locale encoding (LC_CTYPE). The output should help me to understand what is the *effective* encoding of HP-UX for the C locale.

You may modify the c_locale.c to replace "C" with "POSIX", to see if the behaviour is different.

michael-o · 2018-08-28T08:20:21Z

Please see here:

osipovmi@blnn724x:~ []
$ uname -a
HP-UX blnn724x B.11.31 U ia64 HP-UX
osipovmi@blnn724x:~ []
$ locale
LANG=de_DE.utf8
LC_CTYPE="de_DE.utf8"
LC_COLLATE="de_DE.utf8"
LC_MONETARY="de_DE.utf8"
LC_NUMERIC="de_DE.utf8"
LC_TIME="de_DE.utf8"
LC_MESSAGES="de_DE.utf8"
LC_ALL=
osipovmi@blnn724x:~ []
$ cc -o c_locale c_locale.c
osipovmi@blnn724x:~ []
$ file c_locale
c_locale: ELF-32 executable object file - IA64
osipovmi@blnn724x:~ []
$ ./c_locale
LC_ALL: C C C C C C
LC_CTYPE: C C C C C C
nl_langinfo(CODESET): roman8
byte 0x00 decoded to Unicode character U+0000
byte 0x01 decoded to Unicode character U+0001
byte 0x02 decoded to Unicode character U+0002
byte 0x03 decoded to Unicode character U+0003
byte 0x04 decoded to Unicode character U+0004
byte 0x05 decoded to Unicode character U+0005
byte 0x06 decoded to Unicode character U+0006
byte 0x07 decoded to Unicode character U+0007
byte 0x08 decoded to Unicode character U+0008
byte 0x09 decoded to Unicode character U+0009
byte 0x0A decoded to Unicode character U+000A
byte 0x0B decoded to Unicode character U+000B
byte 0x0C decoded to Unicode character U+000C
byte 0x0D decoded to Unicode character U+000D
byte 0x0E decoded to Unicode character U+000E
byte 0x0F decoded to Unicode character U+000F
byte 0x10 decoded to Unicode character U+0010
byte 0x11 decoded to Unicode character U+0011
byte 0x12 decoded to Unicode character U+0012
byte 0x13 decoded to Unicode character U+0013
byte 0x14 decoded to Unicode character U+0014
byte 0x15 decoded to Unicode character U+0015
byte 0x16 decoded to Unicode character U+0016
byte 0x17 decoded to Unicode character U+0017
byte 0x18 decoded to Unicode character U+0018
byte 0x19 decoded to Unicode character U+0019
byte 0x1A decoded to Unicode character U+001A
byte 0x1B decoded to Unicode character U+001B
byte 0x1C decoded to Unicode character U+001C
byte 0x1D decoded to Unicode character U+001D
byte 0x1E decoded to Unicode character U+001E
byte 0x1F decoded to Unicode character U+001F
byte 0x20 decoded to Unicode character U+0020
byte 0x21 decoded to Unicode character U+0021
byte 0x22 decoded to Unicode character U+0022
byte 0x23 decoded to Unicode character U+0023
byte 0x24 decoded to Unicode character U+0024
byte 0x25 decoded to Unicode character U+0025
byte 0x26 decoded to Unicode character U+0026
byte 0x27 decoded to Unicode character U+0027
byte 0x28 decoded to Unicode character U+0028
byte 0x29 decoded to Unicode character U+0029
byte 0x2A decoded to Unicode character U+002A
byte 0x2B decoded to Unicode character U+002B
byte 0x2C decoded to Unicode character U+002C
byte 0x2D decoded to Unicode character U+002D
byte 0x2E decoded to Unicode character U+002E
byte 0x2F decoded to Unicode character U+002F
byte 0x30 decoded to Unicode character U+0030
byte 0x31 decoded to Unicode character U+0031
byte 0x32 decoded to Unicode character U+0032
byte 0x33 decoded to Unicode character U+0033
byte 0x34 decoded to Unicode character U+0034
byte 0x35 decoded to Unicode character U+0035
byte 0x36 decoded to Unicode character U+0036
byte 0x37 decoded to Unicode character U+0037
byte 0x38 decoded to Unicode character U+0038
byte 0x39 decoded to Unicode character U+0039
byte 0x3A decoded to Unicode character U+003A
byte 0x3B decoded to Unicode character U+003B
byte 0x3C decoded to Unicode character U+003C
byte 0x3D decoded to Unicode character U+003D
byte 0x3E decoded to Unicode character U+003E
byte 0x3F decoded to Unicode character U+003F
byte 0x40 decoded to Unicode character U+0040
byte 0x41 decoded to Unicode character U+0041
byte 0x42 decoded to Unicode character U+0042
byte 0x43 decoded to Unicode character U+0043
byte 0x44 decoded to Unicode character U+0044
byte 0x45 decoded to Unicode character U+0045
byte 0x46 decoded to Unicode character U+0046
byte 0x47 decoded to Unicode character U+0047
byte 0x48 decoded to Unicode character U+0048
byte 0x49 decoded to Unicode character U+0049
byte 0x4A decoded to Unicode character U+004A
byte 0x4B decoded to Unicode character U+004B
byte 0x4C decoded to Unicode character U+004C
byte 0x4D decoded to Unicode character U+004D
byte 0x4E decoded to Unicode character U+004E
byte 0x4F decoded to Unicode character U+004F
byte 0x50 decoded to Unicode character U+0050
byte 0x51 decoded to Unicode character U+0051
byte 0x52 decoded to Unicode character U+0052
byte 0x53 decoded to Unicode character U+0053
byte 0x54 decoded to Unicode character U+0054
byte 0x55 decoded to Unicode character U+0055
byte 0x56 decoded to Unicode character U+0056
byte 0x57 decoded to Unicode character U+0057
byte 0x58 decoded to Unicode character U+0058
byte 0x59 decoded to Unicode character U+0059
byte 0x5A decoded to Unicode character U+005A
byte 0x5B decoded to Unicode character U+005B
byte 0x5C decoded to Unicode character U+005C
byte 0x5D decoded to Unicode character U+005D
byte 0x5E decoded to Unicode character U+005E
byte 0x5F decoded to Unicode character U+005F
byte 0x60 decoded to Unicode character U+0060
byte 0x61 decoded to Unicode character U+0061
byte 0x62 decoded to Unicode character U+0062
byte 0x63 decoded to Unicode character U+0063
byte 0x64 decoded to Unicode character U+0064
byte 0x65 decoded to Unicode character U+0065
byte 0x66 decoded to Unicode character U+0066
byte 0x67 decoded to Unicode character U+0067
byte 0x68 decoded to Unicode character U+0068
byte 0x69 decoded to Unicode character U+0069
byte 0x6A decoded to Unicode character U+006A
byte 0x6B decoded to Unicode character U+006B
byte 0x6C decoded to Unicode character U+006C
byte 0x6D decoded to Unicode character U+006D
byte 0x6E decoded to Unicode character U+006E
byte 0x6F decoded to Unicode character U+006F
byte 0x70 decoded to Unicode character U+0070
byte 0x71 decoded to Unicode character U+0071
byte 0x72 decoded to Unicode character U+0072
byte 0x73 decoded to Unicode character U+0073
byte 0x74 decoded to Unicode character U+0074
byte 0x75 decoded to Unicode character U+0075
byte 0x76 decoded to Unicode character U+0076
byte 0x77 decoded to Unicode character U+0077
byte 0x78 decoded to Unicode character U+0078
byte 0x79 decoded to Unicode character U+0079
byte 0x7A decoded to Unicode character U+007A
byte 0x7B decoded to Unicode character U+007B
byte 0x7C decoded to Unicode character U+007C
byte 0x7D decoded to Unicode character U+007D
byte 0x7E decoded to Unicode character U+007E
byte 0x7F decoded to Unicode character U+007F
byte 0x80 decoded to Unicode character U+0080
byte 0x81 decoded to Unicode character U+0081
byte 0x82 decoded to Unicode character U+0082
byte 0x83 decoded to Unicode character U+0083
byte 0x84 decoded to Unicode character U+0084
byte 0x85 decoded to Unicode character U+0085
byte 0x86 decoded to Unicode character U+0086
byte 0x87 decoded to Unicode character U+0087
byte 0x88 decoded to Unicode character U+0088
byte 0x89 decoded to Unicode character U+0089
byte 0x8A decoded to Unicode character U+008A
byte 0x8B decoded to Unicode character U+008B
byte 0x8C decoded to Unicode character U+008C
byte 0x8D decoded to Unicode character U+008D
byte 0x8E decoded to Unicode character U+008E
byte 0x8F decoded to Unicode character U+008F
byte 0x90 decoded to Unicode character U+0090
byte 0x91 decoded to Unicode character U+0091
byte 0x92 decoded to Unicode character U+0092
byte 0x93 decoded to Unicode character U+0093
byte 0x94 decoded to Unicode character U+0094
byte 0x95 decoded to Unicode character U+0095
byte 0x96 decoded to Unicode character U+0096
byte 0x97 decoded to Unicode character U+0097
byte 0x98 decoded to Unicode character U+0098
byte 0x99 decoded to Unicode character U+0099
byte 0x9A decoded to Unicode character U+009A
byte 0x9B decoded to Unicode character U+009B
byte 0x9C decoded to Unicode character U+009C
byte 0x9D decoded to Unicode character U+009D
byte 0x9E decoded to Unicode character U+009E
byte 0x9F decoded to Unicode character U+009F
byte 0xA0 decoded to Unicode character U+00A0
byte 0xA1 decoded to Unicode character U+00A1
byte 0xA2 decoded to Unicode character U+00A2
byte 0xA3 decoded to Unicode character U+00A3
byte 0xA4 decoded to Unicode character U+00A4
byte 0xA5 decoded to Unicode character U+00A5
byte 0xA6 decoded to Unicode character U+00A6
byte 0xA7 decoded to Unicode character U+00A7
byte 0xA8 decoded to Unicode character U+00A8
byte 0xA9 decoded to Unicode character U+00A9
byte 0xAA decoded to Unicode character U+00AA
byte 0xAB decoded to Unicode character U+00AB
byte 0xAC decoded to Unicode character U+00AC
byte 0xAD decoded to Unicode character U+00AD
byte 0xAE decoded to Unicode character U+00AE
byte 0xAF decoded to Unicode character U+00AF
byte 0xB0 decoded to Unicode character U+00B0
byte 0xB1 decoded to Unicode character U+00B1
byte 0xB2 decoded to Unicode character U+00B2
byte 0xB3 decoded to Unicode character U+00B3
byte 0xB4 decoded to Unicode character U+00B4
byte 0xB5 decoded to Unicode character U+00B5
byte 0xB6 decoded to Unicode character U+00B6
byte 0xB7 decoded to Unicode character U+00B7
byte 0xB8 decoded to Unicode character U+00B8
byte 0xB9 decoded to Unicode character U+00B9
byte 0xBA decoded to Unicode character U+00BA
byte 0xBB decoded to Unicode character U+00BB
byte 0xBC decoded to Unicode character U+00BC
byte 0xBD decoded to Unicode character U+00BD
byte 0xBE decoded to Unicode character U+00BE
byte 0xBF decoded to Unicode character U+00BF
byte 0xC0 decoded to Unicode character U+00C0
byte 0xC1 decoded to Unicode character U+00C1
byte 0xC2 decoded to Unicode character U+00C2
byte 0xC3 decoded to Unicode character U+00C3
byte 0xC4 decoded to Unicode character U+00C4
byte 0xC5 decoded to Unicode character U+00C5
byte 0xC6 decoded to Unicode character U+00C6
byte 0xC7 decoded to Unicode character U+00C7
byte 0xC8 decoded to Unicode character U+00C8
byte 0xC9 decoded to Unicode character U+00C9
byte 0xCA decoded to Unicode character U+00CA
byte 0xCB decoded to Unicode character U+00CB
byte 0xCC decoded to Unicode character U+00CC
byte 0xCD decoded to Unicode character U+00CD
byte 0xCE decoded to Unicode character U+00CE
byte 0xCF decoded to Unicode character U+00CF
byte 0xD0 decoded to Unicode character U+00D0
byte 0xD1 decoded to Unicode character U+00D1
byte 0xD2 decoded to Unicode character U+00D2
byte 0xD3 decoded to Unicode character U+00D3
byte 0xD4 decoded to Unicode character U+00D4
byte 0xD5 decoded to Unicode character U+00D5
byte 0xD6 decoded to Unicode character U+00D6
byte 0xD7 decoded to Unicode character U+00D7
byte 0xD8 decoded to Unicode character U+00D8
byte 0xD9 decoded to Unicode character U+00D9
byte 0xDA decoded to Unicode character U+00DA
byte 0xDB decoded to Unicode character U+00DB
byte 0xDC decoded to Unicode character U+00DC
byte 0xDD decoded to Unicode character U+00DD
byte 0xDE decoded to Unicode character U+00DE
byte 0xDF decoded to Unicode character U+00DF
byte 0xE0 decoded to Unicode character U+00E0
byte 0xE1 decoded to Unicode character U+00E1
byte 0xE2 decoded to Unicode character U+00E2
byte 0xE3 decoded to Unicode character U+00E3
byte 0xE4 decoded to Unicode character U+00E4
byte 0xE5 decoded to Unicode character U+00E5
byte 0xE6 decoded to Unicode character U+00E6
byte 0xE7 decoded to Unicode character U+00E7
byte 0xE8 decoded to Unicode character U+00E8
byte 0xE9 decoded to Unicode character U+00E9
byte 0xEA decoded to Unicode character U+00EA
byte 0xEB decoded to Unicode character U+00EB
byte 0xEC decoded to Unicode character U+00EC
byte 0xED decoded to Unicode character U+00ED
byte 0xEE decoded to Unicode character U+00EE
byte 0xEF decoded to Unicode character U+00EF
byte 0xF0 decoded to Unicode character U+00F0
byte 0xF1 decoded to Unicode character U+00F1
byte 0xF2 decoded to Unicode character U+00F2
byte 0xF3 decoded to Unicode character U+00F3
byte 0xF4 decoded to Unicode character U+00F4
byte 0xF5 decoded to Unicode character U+00F5
byte 0xF6 decoded to Unicode character U+00F6
byte 0xF7 decoded to Unicode character U+00F7
byte 0xF8 decoded to Unicode character U+00F8
byte 0xF9 decoded to Unicode character U+00F9
byte 0xFA decoded to Unicode character U+00FA
byte 0xFB decoded to Unicode character U+00FB
byte 0xFC decoded to Unicode character U+00FC
byte 0xFD decoded to Unicode character U+00FD
byte 0xFE decoded to Unicode character U+00FE
$ vim c_locale.c
...
"c_locale.c" 26L, 747C geschrieben
osipovmi@blnn724x:~ []
$ cc -o c_locale c_locale.c
osipovmi@blnn724x:~ []
$ file c_locale
c_locale: ELF-32 executable object file - IA64
osipovmi@blnn724x:~ []
$ ./c_locale
LC_ALL: POSIX POSIX POSIX POSIX POSIX POSIX
LC_CTYPE: POSIX POSIX POSIX POSIX POSIX POSIX
nl_langinfo(CODESET): roman8
byte 0x00 decoded to Unicode character U+0000
byte 0x01 decoded to Unicode character U+0001
byte 0x02 decoded to Unicode character U+0002
byte 0x03 decoded to Unicode character U+0003
byte 0x04 decoded to Unicode character U+0004
byte 0x05 decoded to Unicode character U+0005
byte 0x06 decoded to Unicode character U+0006
byte 0x07 decoded to Unicode character U+0007
byte 0x08 decoded to Unicode character U+0008
byte 0x09 decoded to Unicode character U+0009
byte 0x0A decoded to Unicode character U+000A
byte 0x0B decoded to Unicode character U+000B
byte 0x0C decoded to Unicode character U+000C
byte 0x0D decoded to Unicode character U+000D
byte 0x0E decoded to Unicode character U+000E
byte 0x0F decoded to Unicode character U+000F
byte 0x10 decoded to Unicode character U+0010
byte 0x11 decoded to Unicode character U+0011
byte 0x12 decoded to Unicode character U+0012
byte 0x13 decoded to Unicode character U+0013
byte 0x14 decoded to Unicode character U+0014
byte 0x15 decoded to Unicode character U+0015
byte 0x16 decoded to Unicode character U+0016
byte 0x17 decoded to Unicode character U+0017
byte 0x18 decoded to Unicode character U+0018
byte 0x19 decoded to Unicode character U+0019
byte 0x1A decoded to Unicode character U+001A
byte 0x1B decoded to Unicode character U+001B
byte 0x1C decoded to Unicode character U+001C
byte 0x1D decoded to Unicode character U+001D
byte 0x1E decoded to Unicode character U+001E
byte 0x1F decoded to Unicode character U+001F
byte 0x20 decoded to Unicode character U+0020
byte 0x21 decoded to Unicode character U+0021
byte 0x22 decoded to Unicode character U+0022
byte 0x23 decoded to Unicode character U+0023
byte 0x24 decoded to Unicode character U+0024
byte 0x25 decoded to Unicode character U+0025
byte 0x26 decoded to Unicode character U+0026
byte 0x27 decoded to Unicode character U+0027
byte 0x28 decoded to Unicode character U+0028
byte 0x29 decoded to Unicode character U+0029
byte 0x2A decoded to Unicode character U+002A
byte 0x2B decoded to Unicode character U+002B
byte 0x2C decoded to Unicode character U+002C
byte 0x2D decoded to Unicode character U+002D
byte 0x2E decoded to Unicode character U+002E
byte 0x2F decoded to Unicode character U+002F
byte 0x30 decoded to Unicode character U+0030
byte 0x31 decoded to Unicode character U+0031
byte 0x32 decoded to Unicode character U+0032
byte 0x33 decoded to Unicode character U+0033
byte 0x34 decoded to Unicode character U+0034
byte 0x35 decoded to Unicode character U+0035
byte 0x36 decoded to Unicode character U+0036
byte 0x37 decoded to Unicode character U+0037
byte 0x38 decoded to Unicode character U+0038
byte 0x39 decoded to Unicode character U+0039
byte 0x3A decoded to Unicode character U+003A
byte 0x3B decoded to Unicode character U+003B
byte 0x3C decoded to Unicode character U+003C
byte 0x3D decoded to Unicode character U+003D
byte 0x3E decoded to Unicode character U+003E
byte 0x3F decoded to Unicode character U+003F
byte 0x40 decoded to Unicode character U+0040
byte 0x41 decoded to Unicode character U+0041
byte 0x42 decoded to Unicode character U+0042
byte 0x43 decoded to Unicode character U+0043
byte 0x44 decoded to Unicode character U+0044
byte 0x45 decoded to Unicode character U+0045
byte 0x46 decoded to Unicode character U+0046
byte 0x47 decoded to Unicode character U+0047
byte 0x48 decoded to Unicode character U+0048
byte 0x49 decoded to Unicode character U+0049
byte 0x4A decoded to Unicode character U+004A
byte 0x4B decoded to Unicode character U+004B
byte 0x4C decoded to Unicode character U+004C
byte 0x4D decoded to Unicode character U+004D
byte 0x4E decoded to Unicode character U+004E
byte 0x4F decoded to Unicode character U+004F
byte 0x50 decoded to Unicode character U+0050
byte 0x51 decoded to Unicode character U+0051
byte 0x52 decoded to Unicode character U+0052
byte 0x53 decoded to Unicode character U+0053
byte 0x54 decoded to Unicode character U+0054
byte 0x55 decoded to Unicode character U+0055
byte 0x56 decoded to Unicode character U+0056
byte 0x57 decoded to Unicode character U+0057
byte 0x58 decoded to Unicode character U+0058
byte 0x59 decoded to Unicode character U+0059
byte 0x5A decoded to Unicode character U+005A
byte 0x5B decoded to Unicode character U+005B
byte 0x5C decoded to Unicode character U+005C
byte 0x5D decoded to Unicode character U+005D
byte 0x5E decoded to Unicode character U+005E
byte 0x5F decoded to Unicode character U+005F
byte 0x60 decoded to Unicode character U+0060
byte 0x61 decoded to Unicode character U+0061
byte 0x62 decoded to Unicode character U+0062
byte 0x63 decoded to Unicode character U+0063
byte 0x64 decoded to Unicode character U+0064
byte 0x65 decoded to Unicode character U+0065
byte 0x66 decoded to Unicode character U+0066
byte 0x67 decoded to Unicode character U+0067
byte 0x68 decoded to Unicode character U+0068
byte 0x69 decoded to Unicode character U+0069
byte 0x6A decoded to Unicode character U+006A
byte 0x6B decoded to Unicode character U+006B
byte 0x6C decoded to Unicode character U+006C
byte 0x6D decoded to Unicode character U+006D
byte 0x6E decoded to Unicode character U+006E
byte 0x6F decoded to Unicode character U+006F
byte 0x70 decoded to Unicode character U+0070
byte 0x71 decoded to Unicode character U+0071
byte 0x72 decoded to Unicode character U+0072
byte 0x73 decoded to Unicode character U+0073
byte 0x74 decoded to Unicode character U+0074
byte 0x75 decoded to Unicode character U+0075
byte 0x76 decoded to Unicode character U+0076
byte 0x77 decoded to Unicode character U+0077
byte 0x78 decoded to Unicode character U+0078
byte 0x79 decoded to Unicode character U+0079
byte 0x7A decoded to Unicode character U+007A
byte 0x7B decoded to Unicode character U+007B
byte 0x7C decoded to Unicode character U+007C
byte 0x7D decoded to Unicode character U+007D
byte 0x7E decoded to Unicode character U+007E
byte 0x7F decoded to Unicode character U+007F
byte 0x80 decoded to Unicode character U+0080
byte 0x81 decoded to Unicode character U+0081
byte 0x82 decoded to Unicode character U+0082
byte 0x83 decoded to Unicode character U+0083
byte 0x84 decoded to Unicode character U+0084
byte 0x85 decoded to Unicode character U+0085
byte 0x86 decoded to Unicode character U+0086
byte 0x87 decoded to Unicode character U+0087
byte 0x88 decoded to Unicode character U+0088
byte 0x89 decoded to Unicode character U+0089
byte 0x8A decoded to Unicode character U+008A
byte 0x8B decoded to Unicode character U+008B
byte 0x8C decoded to Unicode character U+008C
byte 0x8D decoded to Unicode character U+008D
byte 0x8E decoded to Unicode character U+008E
byte 0x8F decoded to Unicode character U+008F
byte 0x90 decoded to Unicode character U+0090
byte 0x91 decoded to Unicode character U+0091
byte 0x92 decoded to Unicode character U+0092
byte 0x93 decoded to Unicode character U+0093
byte 0x94 decoded to Unicode character U+0094
byte 0x95 decoded to Unicode character U+0095
byte 0x96 decoded to Unicode character U+0096
byte 0x97 decoded to Unicode character U+0097
byte 0x98 decoded to Unicode character U+0098
byte 0x99 decoded to Unicode character U+0099
byte 0x9A decoded to Unicode character U+009A
byte 0x9B decoded to Unicode character U+009B
byte 0x9C decoded to Unicode character U+009C
byte 0x9D decoded to Unicode character U+009D
byte 0x9E decoded to Unicode character U+009E
byte 0x9F decoded to Unicode character U+009F
byte 0xA0 decoded to Unicode character U+00A0
byte 0xA1 decoded to Unicode character U+00A1
byte 0xA2 decoded to Unicode character U+00A2
byte 0xA3 decoded to Unicode character U+00A3
byte 0xA4 decoded to Unicode character U+00A4
byte 0xA5 decoded to Unicode character U+00A5
byte 0xA6 decoded to Unicode character U+00A6
byte 0xA7 decoded to Unicode character U+00A7
byte 0xA8 decoded to Unicode character U+00A8
byte 0xA9 decoded to Unicode character U+00A9
byte 0xAA decoded to Unicode character U+00AA
byte 0xAB decoded to Unicode character U+00AB
byte 0xAC decoded to Unicode character U+00AC
byte 0xAD decoded to Unicode character U+00AD
byte 0xAE decoded to Unicode character U+00AE
byte 0xAF decoded to Unicode character U+00AF
byte 0xB0 decoded to Unicode character U+00B0
byte 0xB1 decoded to Unicode character U+00B1
byte 0xB2 decoded to Unicode character U+00B2
byte 0xB3 decoded to Unicode character U+00B3
byte 0xB4 decoded to Unicode character U+00B4
byte 0xB5 decoded to Unicode character U+00B5
byte 0xB6 decoded to Unicode character U+00B6
byte 0xB7 decoded to Unicode character U+00B7
byte 0xB8 decoded to Unicode character U+00B8
byte 0xB9 decoded to Unicode character U+00B9
byte 0xBA decoded to Unicode character U+00BA
byte 0xBB decoded to Unicode character U+00BB
byte 0xBC decoded to Unicode character U+00BC
byte 0xBD decoded to Unicode character U+00BD
byte 0xBE decoded to Unicode character U+00BE
byte 0xBF decoded to Unicode character U+00BF
byte 0xC0 decoded to Unicode character U+00C0
byte 0xC1 decoded to Unicode character U+00C1
byte 0xC2 decoded to Unicode character U+00C2
byte 0xC3 decoded to Unicode character U+00C3
byte 0xC4 decoded to Unicode character U+00C4
byte 0xC5 decoded to Unicode character U+00C5
byte 0xC6 decoded to Unicode character U+00C6
byte 0xC7 decoded to Unicode character U+00C7
byte 0xC8 decoded to Unicode character U+00C8
byte 0xC9 decoded to Unicode character U+00C9
byte 0xCA decoded to Unicode character U+00CA
byte 0xCB decoded to Unicode character U+00CB
byte 0xCC decoded to Unicode character U+00CC
byte 0xCD decoded to Unicode character U+00CD
byte 0xCE decoded to Unicode character U+00CE
byte 0xCF decoded to Unicode character U+00CF
byte 0xD0 decoded to Unicode character U+00D0
byte 0xD1 decoded to Unicode character U+00D1
byte 0xD2 decoded to Unicode character U+00D2
byte 0xD3 decoded to Unicode character U+00D3
byte 0xD4 decoded to Unicode character U+00D4
byte 0xD5 decoded to Unicode character U+00D5
byte 0xD6 decoded to Unicode character U+00D6
byte 0xD7 decoded to Unicode character U+00D7
byte 0xD8 decoded to Unicode character U+00D8
byte 0xD9 decoded to Unicode character U+00D9
byte 0xDA decoded to Unicode character U+00DA
byte 0xDB decoded to Unicode character U+00DB
byte 0xDC decoded to Unicode character U+00DC
byte 0xDD decoded to Unicode character U+00DD
byte 0xDE decoded to Unicode character U+00DE
byte 0xDF decoded to Unicode character U+00DF
byte 0xE0 decoded to Unicode character U+00E0
byte 0xE1 decoded to Unicode character U+00E1
byte 0xE2 decoded to Unicode character U+00E2
byte 0xE3 decoded to Unicode character U+00E3
byte 0xE4 decoded to Unicode character U+00E4
byte 0xE5 decoded to Unicode character U+00E5
byte 0xE6 decoded to Unicode character U+00E6
byte 0xE7 decoded to Unicode character U+00E7
byte 0xE8 decoded to Unicode character U+00E8
byte 0xE9 decoded to Unicode character U+00E9
byte 0xEA decoded to Unicode character U+00EA
byte 0xEB decoded to Unicode character U+00EB
byte 0xEC decoded to Unicode character U+00EC
byte 0xED decoded to Unicode character U+00ED
byte 0xEE decoded to Unicode character U+00EE
byte 0xEF decoded to Unicode character U+00EF
byte 0xF0 decoded to Unicode character U+00F0
byte 0xF1 decoded to Unicode character U+00F1
byte 0xF2 decoded to Unicode character U+00F2
byte 0xF3 decoded to Unicode character U+00F3
byte 0xF4 decoded to Unicode character U+00F4
byte 0xF5 decoded to Unicode character U+00F5
byte 0xF6 decoded to Unicode character U+00F6
byte 0xF7 decoded to Unicode character U+00F7
byte 0xF8 decoded to Unicode character U+00F8
byte 0xF9 decoded to Unicode character U+00F9
byte 0xFA decoded to Unicode character U+00FA
byte 0xFB decoded to Unicode character U+00FB
byte 0xFC decoded to Unicode character U+00FC
byte 0xFD decoded to Unicode character U+00FD
byte 0xFE decoded to Unicode character U+00FE

If you think this is a bug, I can happily report this to HPE.

vstinner · 2018-08-28T08:59:06Z

...

byte 0xA7 decoded to Unicode character U+00A7
...

Well, it confirms what I expected: nl_langinfo(CODESET) announces "roman8", but mbstowcs() uses Latin1 encoding in practice.

So I wrote the PR 8969 which forces the ASCII encoding in that case. I'm not sure how test_utf8_mode is supposed to be fixed in that case.

Michael: you can try to apply PR 8969, and then apply manually PR 8967 patch:
https://patch-diff.githubusercontent.com/raw/python/cpython/pull/8967.patch

But I expect that with both patches, test_utf8_mode will still fail on test_cmd_line(). You can try to modify test_cmd_line() to force encoding to "ascii".

What are the values of sys.getfilesystemencoding() and locale.getpreferredencoding() with the C locale with PR 8969? I expect "roman8" which can cause issue in os.fsencode()/os.fsdecode(). Maybe Python should also force ASCII here?

michael-o · 2018-08-28T10:50:45Z

Here is the output to your questions:
> osipovmi@blnn724x:/var/osipovmi/cpython []
> $ git checkout hpux_force_ascii
> Branch 'hpux_force_ascii' set up to track remote branch 'hpux_force_ascii' from 'vstinner'.
> Switched to a new branch 'hpux_force_ascii'
> osipovmi@blnn724x:/var/osipovmi/cpython []
> $ git cherry-pick 217af1d38db3e1e875180c6fa160f0fc80e46003
> [hpux_force_ascii 7ce2927185] bpo-34403, bpo-34207: Fix test_utf8_mode.test_cmd_line()
>  Author: Victor Stinner <vstinner@redhat.com>
>  Date: Tue Aug 28 09:35:25 2018 +0200
>  1 file changed, 20 insertions(+), 11 deletions(-)
> osipovmi@blnn724x:/var/osipovmi/cpython []
> $ export CC=/opt/aCC/bin/cc ; \
>  export CXX=/opt/aCC/bin/aCC ; \
>  export LDFLAGS=-L/usr/local/lib/hpux32 ; \
>  export UNIX_STD=1998 ; \
> ./configure --prefix=/var/osipovmi/python37-testing --without-gcc --with-system-expat --with-pydebug --with-openssl=/opt/openssl
> ...
> osipovmi@blnn724x:/var/osipovmi/cpython []
> $ gmake -j 8
> ...
> osipovmi@blnn724x:/var/osipovmi/cpython []
> $ ./python -m test test_utf8_mode
> Run tests sequentially
> 0:00:00 [1/1] test_utf8_mode
> test test_utf8_mode failed -- Traceback (most recent call last):
>   File "/var/osipovmi/cpython/Lib/test/test_utf8_mode.py", line 235, in test_cmd_line
>     LC_ALL='C')
>   File "/var/osipovmi/cpython/Lib/test/test_utf8_mode.py", line 214, in check
>     self.assertEqual(args, ascii(expected), out)
> AssertionError: "['h\\udca7\\udce9']" != "['h\\xcf\\xd5']"
> - ['h\udca7\udce9']
> + ['h\xcf\xd5']
>  : roman8:['h\udca7\udce9']
> 
> test_utf8_mode failed
> 
> == Tests result: FAILURE ==
> 
> 1 test failed:
>     test_utf8_mode
> 
> Total duration: 3 sec 58 ms
> Tests result: FAILURE
> osipovmi@blnn724x:/var/osipovmi/cpython []
> $ git diff
> diff --git a/Lib/test/test_utf8_mode.py b/Lib/test/test_utf8_mode.py
> index 5af35aed61..89c1f92615 100644
> --- a/Lib/test/test_utf8_mode.py
> +++ b/Lib/test/test_utf8_mode.py
> @@ -231,7 +231,7 @@ class UTF8ModeTests(unittest.TestCase):
> 
>          # Check that the command line is decoded from the locale encoding
>          with self.subTest(encoding=encoding):
> -            check('utf8=0', [arg.decode(encoding, 'surrogateescape')],
> +            check('utf8=0', [arg.decode('ascii', 'surrogateescape')],
>                    LC_ALL='C')
> 
>      def test_optim_level(self):
> osipovmi@blnn724x:/var/osipovmi/cpython []
> $ ./python -m test test_utf8_mode
> Run tests sequentially
> 0:00:00 [1/1] test_utf8_mode
> 
> == Tests result: SUCCESS ==
> 
> 1 test OK.
> 
> Total duration: 3 sec 65 ms
> Tests result: SUCCESS
> 
> osipovmi@blnn724x:/var/osipovmi/cpython []
> $ LC_ALL=C ./python -X utf8=0
> Python 3.8.0a0 (heads/hpux_force_ascii:7ce2927185, Aug 28 2018, 12:43:04) [C] on hp-ux11
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import locale ; import sys
> >>> sys.getfilesystemencoding() ; locale.getpreferredencoding()
> 'hp-roman8'
> 'roman8'
> >>>
> osipovmi@blnn724x:/var/osipovmi/cpython []
> $

I cannot give a qualified answer on

Maybe Python should also force ASCII here?
but since you have figured out that the conversion is broken, one must treat is as such or use ASCII only and UTF-8 for the rest.

vstinner · 2018-08-28T11:20:07Z

       check('utf8=0', [arg.decode(encoding, 'surrogateescape')],

       check('utf8=0', [arg.decode('ascii', 'surrogateescape')],
             LC_ALL='C')

(...)
== Tests result: SUCCESS ==

Good, it works.

I updated my PR 8969 to implement properly my idea. With this PR, on HP-UX with C or POSIX locale, Python now uses ASCII for its "filesystem encoding": sys.getfilesystemencoding() returns "ascii".

Michael: can you please try my updated PR 8969?

apply the updated change
recompile Python
run the test suite using: LC_ALL=C ./python -m test -j0 -r

You may also test with the current locale: ./python -m test -j0 -r

If everything is good on your side, I will merge my PR.

aixtools · 2018-08-28T12:10:53Z

No time to compile for a couple of days. Stress from others wins instead.

Maybe on Friday.

Sent from my iPhone

On 28 Aug 2018, at 13:20, STINNER Victor <report@bugs.python.org> wrote:

STINNER Victor <vstinner@redhat.com> added the comment:

> - check('utf8=0', [arg.decode(encoding, 'surrogateescape')],
> + check('utf8=0', [arg.decode('ascii', 'surrogateescape')],
> LC_ALL='C')
> (...)
> == Tests result: SUCCESS ==

Good, it works.

I updated my PR 8969 to implement properly my idea. With this PR, on HP-UX with C or POSIX locale, Python now uses ASCII for its "filesystem encoding": sys.getfilesystemencoding() returns "ascii".

Michael: can you please try my updated PR 8969?

apply the updated change

recompile Python

run the test suite using: LC_ALL=C ./python -m test -j0 -r

You may also test with the current locale: ./python -m test -j0 -r

If everything is good on your side, I will merge my PR.

----------

Python tracker <report@bugs.python.org>
<https://bugs.python.org/issue34403\>

michael-o · 2018-08-28T14:11:41Z

Victor,

this looks good to me:

osipovmi@blnn724x:/var/osipovmi/cpython []
$ git fetch vstinner
remote: Counting objects: 65, done.
remote: Compressing objects: 100% (18/18), done.
remote: Total 65 (delta 41), reused 43 (delta 37), pack-reused 10
Unpacking objects: 100% (65/65), done.
From https://github.com/vstinner/cpython

6171da5569...559de620d7 hpux_force_ascii -> vstinner/hpux_force_ascii (forced update)

[new branch] posix_locale37 -> vstinner/posix_locale37
osipovmi@blnn724x:/var/osipovmi/cpython []
$ git reset --hard vstinner/hpux_force_ascii
HEAD is now at 559de62 bpo-34403: On HP-UX, force ASCII for C locale
$ gmake -j 8
...
osipovmi@blnn724x:/var/osipovmi/cpython []
$ LC_ALL=C ./python -m test -j0 -r -uall,-network
...
== Tests result: FAILURE ==

357 tests OK.

35 tests failed:
test_asyncio test_asyncore test_bytes test_c_locale_coercion
test_code test_concurrent_futures test_ctypes test_devpoll
test_distutils test_embed test_faulthandler test_fileio test_gdb
test_httpservers test_importlib test_io test_mmap
test_multiprocessing_fork test_multiprocessing_forkserver
test_multiprocessing_main_handling test_multiprocessing_spawn
test_normalization test_os test_pkg test_posix test_pty test_re
test_signal test_socket test_subprocess test_support
test_threading test_time test_unicode test_zlib

26 tests skipped:
test_curses test_dbm_gnu test_epoll test_idle test_kqueue
test_lzma test_msilib test_ossaudiodev test_smtpnet
test_socketserver test_spwd test_startfile test_tcl test_timeout
test_tix test_tk test_ttk_guionly test_ttk_textonly test_turtle
test_urllib2net test_urllibnet test_winconsoleio test_winreg
test_winsound test_xmlrpc_net test_zipfile64
osipovmi@blnn724x:/var/osipovmi/cpython []
$ ./python -m test -j0 -r -uall,-network
...
== Tests result: FAILURE ==

357 tests OK.

35 tests failed:
test_asyncio test_asyncore test_bytes test_c_locale_coercion
test_code test_concurrent_futures test_ctypes test_devpoll
test_distutils test_embed test_faulthandler test_fileio test_gdb
test_httpservers test_importlib test_io test_mmap
test_multiprocessing_fork test_multiprocessing_forkserver
test_multiprocessing_main_handling test_multiprocessing_spawn
test_normalization test_os test_posix test_pty test_re
test_readline test_signal test_socket test_subprocess test_support
test_threading test_time test_unicode test_zlib

26 tests skipped:
test_curses test_dbm_gnu test_epoll test_idle test_kqueue
test_lzma test_msilib test_ossaudiodev test_smtpnet
test_socketserver test_spwd test_startfile test_tcl test_timeout
test_tix test_tk test_ttk_guionly test_ttk_textonly test_turtle
test_urllib2net test_urllibnet test_winconsoleio test_winreg
test_winsound test_xmlrpc_net test_zipfile64

Total duration: 14 min 23 sec
Tests result: FAILURE

The test_utf8_mode passes. Some other tests likely fail due to this Roman8 stuff: test_re and friends. I am analyzing the failures step by step and have already a few fixes around. Waiting for other PRs to be merged first.

vstinner · 2018-08-28T15:40:39Z

New changeset d500e53 by Victor Stinner in branch 'master':
bpo-34403: On HP-UX, force ASCII for C locale (GH-8969)
d500e53

michael-o · 2018-08-28T15:57:25Z

Can we backport this to 3.7 at least?

aixtools · 2018-08-28T18:43:44Z

On 28/08/2018 13:20, STINNER Victor wrote:

I updated my PR 8969 to implement properly my idea. With this PR, on HP-UX with C or POSIX locale, Python now uses ASCII for its "filesystem encoding": sys.getfilesystemencoding() returns "ascii".

Michael: can you please try my updated PR 8969?

apply the updated change

recompile Python

run the test suite using: LC_ALL=C ./python -m test -j0 -r

You may also test with the current locale: ./python -m test -j0 -r

Seems to work well as far as AIX and test_utf8_mode (as you had already
merged I pulled master and built that anew).

Attached is the output with LC_ALL=C in the prefix. If you were hoping
for "dangling processes - your hopes are affirmed.

Perhaps also noteworthy:

root@x066:[/data/prj/python/git/python3-3.8]set | grep LC
LC__FASTMSG=true
MAILCHECK=600
root@x066:[/data/prj/python/git/python3-3.8]set | grep LANG
LANG=en_US

aixtools · 2018-08-28T18:53:27Z

On 28/08/2018 20:43, Michael Felt wrote:

Attached is the output with LC_ALL=C in the prefix. If you were hoping
for "dangling processes - your hopes are affirmed.

Previous mail ended with:

== Tests result: FAILURE ==

375 tests OK.

13 tests failed:
    test__xxsubinterpreters test_asyncio test_concurrent_futures
    test_ctypes test_distutils test_embed test_httpservers
    test_multiprocessing_fork test_os test_pkg test_socket
    test_subprocess test_time

30 tests skipped:
    test_curses test_dbm_gnu test_devpoll test_epoll test_gdb
    test_idle test_kqueue test_lzma test_msilib test_ossaudiodev
    test_readline test_smtpnet test_socketserver test_spwd test_sqlite
    test_startfile test_tcl test_timeout test_tix test_tk
    test_ttk_guionly test_ttk_textonly test_turtle test_urllib2net
    test_urllibnet test_winconsoleio test_winreg test_winsound
    test_xmlrpc_net test_zipfile64

Total duration: 14 min 53 sec
Tests result: FAILURE
root@x066:[/data/prj/python/git/python3-3.8]

Without LC_ALL=C summary is (different):

== Tests result: FAILURE ==

376 tests OK.

10 tests failed:
    test__xxsubinterpreters test_asyncio test_ctypes test_distutils
    test_embed test_httpservers test_multiprocessing_forkserver
    test_os test_socket test_time

32 tests skipped:
    test_curses test_dbm_gnu test_devpoll test_epoll test_gdb
    test_idle test_kqueue test_lzma test_msilib test_ossaudiodev
    test_readline test_smtpnet test_socketserver test_spwd test_sqlite
    test_startfile test_tcl test_timeout test_tix test_tk
    test_ttk_guionly test_ttk_textonly test_turtle test_unicode_file
    test_unicode_file_functions test_urllib2net test_urllibnet
    test_winconsoleio test_winreg test_winsound test_xmlrpc_net
    test_zipfile64

Total duration: 11 min 1 sec

And, rather than dangling processes, I see BrokenBarrierErrors

FYI

vstinner · 2018-08-28T21:14:20Z

Can we backport this to 3.7 at least?

My policy is to focus on the master branch to support a new platform. Then add a buildbot and find a core developer to maintain this platform. See the PEP-11 for details.

I would prefer to see a full test suite passing before discussing which changes should or should not be backported.

I would also prefer to first see a more general discussion about who is going to support HP-UX.

IMHO HP-UX is not officially supported today. My list of supported platforms:
https://pythondev.readthedocs.io/cpython.html#supported-platforms

See the test_utf8_mode now pass on HP-UX, I close the issue. Please open more specific issues for other failures. You might open a meta issue to track all HP-UX issues.

michael-o · 2018-08-29T13:37:45Z

Please close, issue fixed. Thank you very much.

vstinner · 2018-08-29T13:49:04Z

Please close, issue fixed. Thank you very much.

You're welcome ;-)

aixtools · 2018-09-01T12:24:44Z

On 28/08/2018 23:14, STINNER Victor wrote:

STINNER Victor <vstinner@redhat.com> added the comment:

> Can we backport this to 3.7 at least?
I am the AIX(tools) Michael, Michael O is the HP-UX Michael :p

So I was not the one asking. IMHO - as the PEP was new, if I understood
correctly, in 3.7 - would be "nice" to see it back ported.

However, like you - my goal is to get the tests passing on master, and
worry about backport later.

My policy is to focus on the master branch to support a new platform. Then add a buildbot and find a core developer to maintain this platform. See the PEP-11 for details.

I would prefer to see a full test suite passing before discussing which changes should or should not be backported.

I would also prefer to first see a more general discussion about who is going to support HP-UX.

IMHO HP-UX is not officially supported today. My list of supported platforms:
https://pythondev.readthedocs.io/cpython.html#supported-platforms

See the test_utf8_mode now pass on HP-UX, I close the issue. Please open more specific issues for other failures. You might open a meta issue to track all HP-UX issues.

----------

Python tracker <report@bugs.python.org>
<https://bugs.python.org/issue34403\>

vstinner · 2018-09-03T08:58:25Z

I am the AIX(tools) Michael, Michael O is the HP-UX Michael :p

Oh, I didn't notice that you two have the same first name :-)

So I was not the one asking. IMHO - as the PEP was new, if I understood
correctly, in 3.7 - would be "nice" to see it back ported. However, like you - my goal is to get the tests passing on master, and
worry about backport later.

*My position didn't change since my last comment, same position for AIX and HP-UX: msg324289. I also updated my website to write down this policy:
https://pythondev.readthedocs.io/cpython.html#i-want-cpython-to-support-my-platform

By the way, please don't comment issues that are closed.

vstinner · 2018-10-30T11:58:13Z

New changeset 905f1ac by Victor Stinner in branch 'master':
bpo-34523: Fix config_init_fs_encoding() for ASCII (GH-10232)
905f1ac

vstinner · 2018-10-30T11:59:10Z

Michael Osipov: Oops, my commit b2457ef broke the filesystem encoding on HP-UX. It should be fixed by my commit 905f1ac.

vstinner · 2018-10-30T11:59:24Z

New changeset 21220bb by Victor Stinner in branch '3.7':
bpo-34403: Fix initfsencoding() for ASCII (GH-10233)
21220bb

vstinner · 2018-10-30T13:32:12Z

New changeset 7d35f79 by Victor Stinner in branch '3.7':
bpo-34403: Always implement _Py_GetForceASCII() (GH-10235)
7d35f79

michael-o · 2018-10-30T14:39:46Z

Victor, looks good to me: 0:00:26 [ 23/419/3] test_utf8_mode passed.

I don't know wether it is related, but test_unicode crash dumps here:
0:00:22 [ 16/419/2] test_unicode crashed (Exit code -11)
Fatal Python error: Segmentation fault

Current thread 0x00000001 (most recent call first):
File "/var/osipovmi/cpython/Lib/test/test_unicode.py", line 2465 in PyUnicode_FromFormat
File "/var/osipovmi/cpython/Lib/test/test_unicode.py", line 2468 in check_format
File "/var/osipovmi/cpython/Lib/test/test_unicode.py", line 2472 in test_from_format
File "/var/osipovmi/cpython/Lib/unittest/case.py", line 610 in run
File "/var/osipovmi/cpython/Lib/unittest/case.py", line 658 in __call__
File "/var/osipovmi/cpython/Lib/unittest/suite.py", line 122 in run
File "/var/osipovmi/cpython/Lib/unittest/suite.py", line 84 in __call__
File "/var/osipovmi/cpython/Lib/unittest/suite.py", line 122 in run
File "/var/osipovmi/cpython/Lib/unittest/suite.py", line 84 in __call__
File "/var/osipovmi/cpython/Lib/unittest/suite.py", line 122 in run
File "/var/osipovmi/cpython/Lib/unittest/suite.py", line 84 in __call__
File "/var/osipovmi/cpython/Lib/test/support/testresult.py", line 162 in run
File "/var/osipovmi/cpython/Lib/test/support/init.py", line 1928 in _run_suite
File "/var/osipovmi/cpython/Lib/test/support/init.py", line 2022 in run_unittest
File "/var/osipovmi/cpython/Lib/test/libregrtest/runtest.py", line 175 in test_runner
File "/var/osipovmi/cpython/Lib/test/libregrtest/runtest.py", line 179 in runtest_inner
File "/var/osipovmi/cpython/Lib/test/libregrtest/runtest.py", line 134 in runtest
File "/var/osipovmi/cpython/Lib/test/libregrtest/runtest_mp.py", line 68 in run_tests_worker
File "/var/osipovmi/cpython/Lib/test/libregrtest/main.py", line 587 in _main
File "/var/osipovmi/cpython/Lib/test/libregrtest/main.py", line 571 in main
File "/var/osipovmi/cpython/Lib/test/libregrtest/main.py", line 627 in main
File "/var/osipovmi/cpython/Lib/test/regrtest.py", line 46 in _main
File "/var/osipovmi/cpython/Lib/test/regrtest.py", line 50 in <module>
File "/var/osipovmi/cpython/Lib/runpy.py", line 85 in _run_code
File "/var/osipovmi/cpython/Lib/runpy.py", line 192 in _run_module_as_main

Is that related to your PEP?

vstinner · 2018-10-30T15:43:10Z

0:00:22 [ 16/419/2] test_unicode crashed (Exit code -11)

Please open a new issue to track this bug.

michael-o mannequin added 3.7 (EOL) end of life stdlib Python modules in the Lib dir tests Tests in the Lib/test dir type-bug An unexpected behavior, bug, or error labels Aug 14, 2018

michael-o mannequin added the 3.8 only security fixes label Aug 21, 2018

vstinner closed this as completed Aug 29, 2018

ezio-melotti transferred this issue from another repository Apr 10, 2022

test_utf8_mode.test_cmd_line() fails on HP-UX due to false assumptions #78584

test_utf8_mode.test_cmd_line() fails on HP-UX due to false assumptions #78584

Comments

michael-o mannequin commented Aug 14, 2018

michael-o mannequin commented Aug 14, 2018

terryjreedy commented Aug 17, 2018

michael-o mannequin commented Aug 18, 2018

aixtools commented Aug 20, 2018

aixtools commented Aug 25, 2018

aixtools commented Aug 25, 2018

michael-o mannequin commented Aug 25, 2018

michael-o mannequin commented Aug 27, 2018

michael-o mannequin commented Aug 27, 2018

michael-o mannequin commented Aug 27, 2018

aixtools commented Aug 27, 2018

michael-o mannequin commented Aug 28, 2018

michael-o mannequin commented Aug 28, 2018

vstinner commented Aug 28, 2018

michael-o mannequin commented Aug 28, 2018

michael-o mannequin commented Aug 28, 2018

michael-o mannequin commented Aug 28, 2018

vstinner commented Aug 28, 2018

michael-o mannequin commented Aug 28, 2018

vstinner commented Aug 28, 2018

michael-o mannequin commented Aug 28, 2018

vstinner commented Aug 28, 2018

aixtools commented Aug 28, 2018

michael-o mannequin commented Aug 28, 2018

vstinner commented Aug 28, 2018

michael-o mannequin commented Aug 28, 2018

aixtools commented Aug 28, 2018

aixtools commented Aug 28, 2018

vstinner commented Aug 28, 2018

michael-o mannequin commented Aug 29, 2018

vstinner commented Aug 29, 2018

aixtools commented Sep 1, 2018

vstinner commented Sep 3, 2018

vstinner commented Oct 30, 2018

vstinner commented Oct 30, 2018

vstinner commented Oct 30, 2018

vstinner commented Oct 30, 2018

michael-o mannequin commented Oct 30, 2018

vstinner commented Oct 30, 2018