This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author serhiy.storchaka
Recipients Brian.Cain, benjamin.peterson, serhiy.storchaka, terry.reedy
Date 2015-11-01.21:47:17
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <>
Stack trace:

#0  ascii_decode (start=0xa72f2008 "", end=0xfffff891 <error: Cannot access memory at address 0xfffff891>, dest=<optimized out>) at Objects/unicodeobject.c:4795
#1  0x08100c0f in PyUnicode_DecodeUTF8Stateful (s=s@entry=0xa72f2008 "", size=size@entry=1490081929, errors=errors@entry=0x81f4303 "replace", consumed=consumed@entry=0x0)
    at Objects/unicodeobject.c:4871
#2  0x081029c7 in PyUnicode_DecodeUTF8 (s=0xa72f2008 "", size=1490081929, errors=errors@entry=0x81f4303 "replace") at Objects/unicodeobject.c:4743
#3  0x0815179a in err_input (err=0xbfffec04) at Python/pythonrun.c:1352
#4  0x081525cf in PyParser_ASTFromFileObject (arena=0x8348118, errcode=0x0, flags=<optimized out>, ps2=0x0, ps1=0x0, start=257, enc=0x0, filename=0xb7950e00, fp=0x8347fb0)
    at Python/pythonrun.c:1163
#5  PyRun_FileExFlags (fp=0x8347fb0, filename_str=0xb79e2eb8 "", start=257, globals=0xb79e3d8c, locals=0xb79e3d8c, closeit=1, flags=0xbfffecec) at Python/pythonrun.c:916
#6  0x08152744 in PyRun_SimpleFileExFlags (fp=0x8347fb0, filename=<optimized out>, closeit=1, flags=0xbfffecec) at Python/pythonrun.c:396
#7  0x08063919 in run_file (p_cf=0xbfffecec, filename=0x82eda10 L"", fp=0x8347fb0) at Modules/main.c:318
#8  Py_Main (argc=argc@entry=2, argv=argv@entry=0x82ed008) at Modules/main.c:768
#9  0x0805f345 in main (argc=2, argv=0xbfffee44) at ./Programs/python.c:69

At #2 PyUnicode_DecodeUTF8 is called with s="" and size=1490081929. size is err->offset, and err->offset is set only in parsetok() in Parser/parsetok.c. This is the tokenizer bug.

Minimal reproducer:

./python -c 'with open("", "wb") as f: f.write(b"\x7f\x00\n\xfd\n")

The crash is gone if comment out the code at the end of decoding_fgets() that tests UTF-8.
Date User Action Args
2015-11-01 21:47:18serhiy.storchakasetrecipients: + serhiy.storchaka, terry.reedy, benjamin.peterson, Brian.Cain
2015-11-01 21:47:18serhiy.storchakasetmessageid: <>
2015-11-01 21:47:17serhiy.storchakalinkissue25388 messages
2015-11-01 21:47:17serhiy.storchakacreate