classification
Title: assertion error in IO reading text file as binary
Type: behavior Stage: resolved
Components: IO Versions: Python 3.4, Python 3.2, Python 3.3, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: Arfrever, benjamin.peterson, georg.brandl, larry, pitrou, python-dev, r.david.murray, serhiy.storchaka
Priority: release blocker Keywords: patch

Created on 2013-02-02 20:13 by r.david.murray, last changed 2013-02-03 15:16 by serhiy.storchaka. This issue is now closed.

Files
File name Uploaded Description Edit
textio_type_check-2.7.patch serhiy.storchaka, 2013-02-03 08:50 review
textio_type_check-3.3.patch serhiy.storchaka, 2013-02-03 08:50 review
textio_type_check-3.2.patch serhiy.storchaka, 2013-02-03 08:50 review
textio_type_check-3.3_2.patch serhiy.storchaka, 2013-02-03 11:05 review
textio_type_check-3.2_2.patch serhiy.storchaka, 2013-02-03 14:34 review
textio_type_check-2.7_2.patch serhiy.storchaka, 2013-02-03 14:34 review
Messages (16)
msg181210 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-02-02 20:13
I came across this by making a mistake, but it shouldn't crash:

rdmurray@hey:~/python/p32>touch temp
rdmurray@hey:~/python/p32>./python  
Python 3.2.3+ (3.2:e6952acd5a55+, Feb  2 2013, 15:04:21) 
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from email import message_from_binary_file as mb
>>> m = mb(open('temp'))
python: ./Modules/_io/textio.c:1454: textiowrapper_read_chunk: Assertion `((((((PyObject*)(input_chunk))->ob_type))->tp_flags & ((1L<<27))) != 0)' failed.
zsh: abort      ./python


This is a regression relative to 3.2.3:

Python 3.2.3 (default, Sep 16 2012, 16:35:39) 
[GCC 4.5.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from email import message_from_binary_file as mb
>>> m = mb(open('temp'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.2/email/__init__.py", line 63, in
message_from_binary_file
    return BytesParser(*args, **kws).parse(fp)
  File "/usr/lib/python3.2/email/parser.py", line 124, in parse
    return self.parser.parse(fp, headersonly)
  File "/usr/lib/python3.2/email/parser.py", line 68, in parse
    data = fp.read(8192)
  File "/usr/lib/python3.2/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
TypeError: 'str' does not support the buffer interface
msg181211 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-02-02 20:17
I see "TypeError: 'str' does not support the buffer interface" on current 3.2 and default.
msg181212 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-02-02 20:18
It might not crash on a debug build.  I haven't tried that.
msg181213 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-02-02 20:18
I mean, non-debug build.
msg181215 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-02-02 20:28
OK, this happens in debug mode in 3.2.3, so it is not a regression.  Still something to be looked in to, since that assert presumably has a purpose.
msg181216 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-02-02 20:29
There is a simpler crasher:

import io
io.TextIOWrapper(io.StringIO()).read(1)
msg181217 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-02-02 21:15
Well, if a proper type check is made later, then the assert can go.
Also, the PyBytes_Size() result should be checked for error, which incidentally ensures the argument is a bytes object.
msg181247 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-02-03 08:50
Crash is possible not only when reading from text files, but also when decoder 
returns a non-string or when decoder's state is not a bytes object. This is 
possible with malicious decoder and perhaps with some old not bytes-to-string 
decoder in stdlib codecs registry.

Here are patches for different versions.
msg181248 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2013-02-03 08:54
Blocker for 3.2.4.
msg181251 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-02-03 09:07
This bug exists also in 2.7.

I doubt about an exception type. On one hand, TypeError looks naturally here and exception of this type implicitly raised by Python implementation. On other hand, there is a precedent in __iter__() which raises IOError when readline() returns non-string.
msg181252 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-02-03 09:17
TypeError should be the right exception here.
msg181256 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-02-03 10:07
Yet one crasher, not fixed by my patch yet.

io.TextIOWrapper(io.BytesIO(b'a'), newline='\n', encoding='quopri_codec').read(1)

Sometimes it crashes (on non-debug build), sometimes raises an exception: "ValueError: character U+82228c0 is not in range [U+0000; U+10ffff]".
msg181259 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-02-03 11:05
Here is an updated patch for 3.3+. An exception type changed to TypeError, fixed some new crashes, added tests. If it is good, I'll backport it to 3.2 and 2.7.
msg181261 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-02-03 11:16
> Here is an updated patch for 3.3+. An exception type changed to
> TypeError, fixed some new crashes, added tests. If it is good, I'll
> backport it to 3.2 and 2.7.

Looks ok to me.
msg181270 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-02-03 14:34
Here are patches for 3.2 and 2.7.

Note that due to unicode-str autoconversions 2.7 not always raises TypeError 
(sometimes it can do nor raise an exception, sometimes it raises 
UnicodeEncodeError). 2.7 tests are not so strong as 3.x tests.
msg181274 - (view) Author: Roundup Robot (python-dev) Date: 2013-02-03 15:14
New changeset 5655cdd3c010 by Serhiy Storchaka in branch '3.2':
Issue #17106: Fix a segmentation fault in io.TextIOWrapper when an underlying
http://hg.python.org/cpython/rev/5655cdd3c010

New changeset 0c4cc967a733 by Serhiy Storchaka in branch '3.3':
Issue #17106: Fix a segmentation fault in io.TextIOWrapper when an underlying
http://hg.python.org/cpython/rev/0c4cc967a733

New changeset 398bcdf55f92 by Serhiy Storchaka in branch 'default':
Issue #17106: Fix a segmentation fault in io.TextIOWrapper when an underlying
http://hg.python.org/cpython/rev/398bcdf55f92

New changeset 19a33ef3821d by Serhiy Storchaka in branch '2.7':
Issue #17106: Fix a segmentation fault in io.TextIOWrapper when an underlying
http://hg.python.org/cpython/rev/19a33ef3821d
History
Date User Action Args
2013-02-03 15:16:26serhiy.storchakasetstatus: open -> closed
resolution: fixed
stage: needs patch -> resolved
2013-02-03 15:14:06python-devsetnosy: + python-dev
messages: + msg181274
2013-02-03 14:39:36serhiy.storchakasetassignee: serhiy.storchaka
2013-02-03 14:34:16serhiy.storchakasetfiles: + textio_type_check-3.2_2.patch, textio_type_check-2.7_2.patch

messages: + msg181270
2013-02-03 11:16:15pitrousetmessages: + msg181261
2013-02-03 11:05:09serhiy.storchakasetfiles: + textio_type_check-3.3_2.patch

messages: + msg181259
2013-02-03 10:07:30serhiy.storchakasetmessages: + msg181256
2013-02-03 09:17:15pitrousetmessages: + msg181252
2013-02-03 09:07:28serhiy.storchakasetnosy: + benjamin.peterson

messages: + msg181251
versions: + Python 2.7
2013-02-03 08:54:19georg.brandlsetmessages: + msg181248
2013-02-03 08:53:19georg.brandlsetpriority: normal -> release blocker
nosy: + larry
2013-02-03 08:50:54serhiy.storchakasetfiles: + textio_type_check-2.7.patch, textio_type_check-3.3.patch, textio_type_check-3.2.patch
keywords: + patch
messages: + msg181247
2013-02-03 04:34:04Arfreversetnosy: + Arfrever
2013-02-02 21:15:05pitrousetmessages: + msg181217
2013-02-02 20:55:47r.david.murraysettitle: Crash in IO reading text file as binary via email library -> assertion error in IO reading text file as binary
2013-02-02 20:29:33serhiy.storchakasetmessages: + msg181216
2013-02-02 20:28:17r.david.murraysetpriority: release blocker -> normal
versions: + Python 3.3, Python 3.4
messages: + msg181215

keywords: - 3.2regression
type: crash -> behavior
2013-02-02 20:18:50r.david.murraysetmessages: + msg181213
2013-02-02 20:18:37r.david.murraysetmessages: + msg181212
2013-02-02 20:17:23serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg181211
2013-02-02 20:13:44r.david.murraycreate