Message334732
In Unix, Python 3.6 decodes the char * command line arguments via mbstowcs. In Linux, I see the following misbehavior of mbstowcs when decoding an overlong UTF-8 sequence:
>>> mbstowcs = ctypes.CDLL(None, use_errno=True).mbstowcs
>>> arg = bytes(x + 128 for x in [1 + 124, 63, 63, 59, 58, 58])
>>> mbstowcs(None, arg, 0)
1
>>> buf = (ctypes.c_int * 2)()
>>> mbstowcs(buf, arg, 2)
1
>>> hex(buf[0])
'0x7fffbeba'
This shouldn't be an issue in 3.7, at least not with the default UTF-8 mode configuration. With this mode, Py_DecodeLocale calls _Py_DecodeUTF8Ex using the surrogateescape error handler [1].
[1]: https://github.com/python/cpython/blob/v3.7.2/Python/fileutils.c#L456 |
|
Date |
User |
Action |
Args |
2019-02-01 23:49:23 | eryksun | set | recipients:
+ eryksun, ncoghlan, SilentGhost, Neui |
2019-02-01 23:49:22 | eryksun | set | messageid: <1549064962.38.0.0212201261109.issue35883@roundup.psfhosted.org> |
2019-02-01 23:49:22 | eryksun | link | issue35883 messages |
2019-02-01 23:49:22 | eryksun | create | |
|