This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author xiang.zhang
Recipients benjamin.peterson, ezio.melotti, lemburg, serhiy.storchaka, sibiryakov, terry.reedy, vstinner, xiang.zhang
Date 2018-01-20.19:28:20
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1516476501.49.0.467229070634.issue32583@psf.upfronthosting.co.za>
In-reply-to
Content
The problem is utf16 decoder almost always assumes that two bytes decodes to one unicode character, so when allocating memory, it assumes (bytes_number+1)/2 unicode slots is enough, there is even a comment in the code. And in unicode_decode_call_errorhandler_writer, it only allocates more memory when the error handler returns a unicode longer than 1, but doesn't take care pace by one, in which case one byte to one unicode character. So it's possible for the decoder to write out of bound.

This example could steadily crash on my Mac with debug version, it writes across the bound of the internal unicode buffer:

>>> import codecs
>>> def pace_by_one(exc):
...     return ('\ufffd', exc.start+1)
...
>>> codecs.register_error('pace_by_one', pace_by_one)
>>> b'\xd8\xd8\xd8\xd8\xd8\xd8\x00\x00\x00'.decode('utf-16-le', 'pace_by_one')
Debug memory block at address p=0x10210c260: API 'o'
    100 bytes originally requested
    The 7 pad bytes at p-7 are FORBIDDENBYTE, as expected.
    The 8 pad bytes at tail=0x10210c2c4 are not all FORBIDDENBYTE (0xfb):
        at tail+0: 0x00 *** OUCH
        at tail+1: 0x00 *** OUCH
        at tail+2: 0xfb
        at tail+3: 0xfb
        at tail+4: 0xfb
        at tail+5: 0xfb
        at tail+6: 0xfb
        at tail+7: 0xfb
    The block was made by call #30672 to debug malloc/realloc.
    Data at p: 00 00 00 00 00 00 00 00 ... fd ff fd ff fd ff d8 00

Fatal Python error: bad trailing pad byte

Current thread 0x00007fffab9b4340 (most recent call first):
  File "/Users/angwer/Repositories/cpython/Lib/encodings/utf_16_le.py", line 16 in decode
  File "<stdin>", line 1 in <module>
[1]    63997 abort      ~/Repositories/cpython/python.exe

I'll try to make a fix tomorrow.
History
Date User Action Args
2018-01-20 19:28:21xiang.zhangsetrecipients: + xiang.zhang, lemburg, terry.reedy, vstinner, benjamin.peterson, ezio.melotti, serhiy.storchaka, sibiryakov
2018-01-20 19:28:21xiang.zhangsetmessageid: <1516476501.49.0.467229070634.issue32583@psf.upfronthosting.co.za>
2018-01-20 19:28:21xiang.zhanglinkissue32583 messages
2018-01-20 19:28:20xiang.zhangcreate