This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author Michael.Felt
Recipients Michael.Felt
Date 2018-08-06.16:06:50
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1533571610.9.0.56676864532.issue34347@psf.upfronthosting.co.za>
In-reply-to
Content
The test fails because

byte_str.decode('ascii', 'surragateescape')

is not what ascii(byte_str) - returns when called from the commandline.

Assumption: since " check('utf8', [arg_utf8])" succeeds I assume the parsing of the command-line is correct.

DETAILS
>>> arg = 'h\xe9\u20ac'.encode('utf-8')
>>> arg
b'h\xc3\xa9\xe2\x82\xac'

>>> arg.decode('ascii', 'surrogateescape')
'h\udcc3\udca9\udce2\udc82\udcac'


I am having a difficult time getting the syntax correct for all the "escapes", so I added a print statement in the check routine:

test_cmd_line (test.test_utf8_mode.UTF8ModeTests) ...
code:import locale, sys; print("%s:%s" % (locale.getpreferredencoding(), ascii(sys.argv[1:]))) arg:b'h\xc3\xa9\xe2\x82\xac'
out:UTF-8:['h\xe9\u20ac']

code:import locale, sys; print("%s:%s" % (locale.getpreferredencoding(), ascii(sys.argv[1:]))) arg:b'h\xc3\xa9\xe2\x82\xac'
out:ISO8859-1:['h\xc3\xa9\xe2\x82\xac']

test code with my debug statement (to generate above):

    def test_cmd_line(self):
        arg = 'h\xe9\u20ac'.encode('utf-8')
        arg_utf8 = arg.decode('utf-8')
        arg_ascii = arg.decode('ascii', 'surrogateescape')
        code = 'import locale, sys; print("%s:%s" % (locale.getpreferredencoding(), ascii(sys.argv[1:])))'

        def check(utf8_opt, expected, **kw):
            out = self.get_output('-X', utf8_opt, '-c', code, arg, **kw)
            print("\ncode:%s arg:%s\nout:%s" % (code, arg, out))
            args = out.partition(':')[2].rstrip()
            self.assertEqual(args, ascii(expected), out)

        check('utf8', [arg_utf8])
        if sys.platform == 'darwin' or support.is_android:
            c_arg = arg_utf8
        else:
            c_arg = arg_ascii
        check('utf8=0', [c_arg], LC_ALL='C')

So the first check succeeds:

        check('utf8', [arg_utf8])

But the second does not:

FAIL: test_cmd_line (test.test_utf8_mode.UTF8ModeTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/data/prj/python/src/python3-3.7.0/Lib/test/test_utf8_mode.py", line 225, in test_cmd_line
    check('utf8=0', [c_arg], LC_ALL='C')
  File "/data/prj/python/src/python3-3.7.0/Lib/test/test_utf8_mode.py", line 218, in check
    self.assertEqual(args, ascii(expected), out)
AssertionError: "['h\\xc3\\xa9\\xe2\\x82\\xac']" != "['h\\udcc3\\udca9\\udce2\\udc82\\udcac']"
- ['h\xc3\xa9\xe2\x82\xac']
+ ['h\udcc3\udca9\udce2\udc82\udcac']
 : ISO8859-1:['h\xc3\xa9\xe2\x82\xac']

I tried saying the "expected" is arg, but arg is still a byte object, the cmd_line result is not (printed as such).

AssertionError: "['h\\xc3\\xa9\\xe2\\x82\\xac']" != "[b'h\\xc3\\xa9\\xe2\\x82\\xac']"
- ['h\xc3\xa9\xe2\x82\xac']
+ [b'h\xc3\xa9\xe2\x82\xac']
?  +
 : ISO8859-1:['h\xc3\xa9\xe2\x82\xac']
History
Date User Action Args
2018-08-06 16:06:50Michael.Feltsetrecipients: + Michael.Felt
2018-08-06 16:06:50Michael.Feltsetmessageid: <1533571610.9.0.56676864532.issue34347@psf.upfronthosting.co.za>
2018-08-06 16:06:50Michael.Feltlinkissue34347 messages
2018-08-06 16:06:50Michael.Feltcreate