classification
Title: int() from a buffer reads past the buffer boundaries
Type: behavior Stage: resolved
Components: Interpreter Core Versions: Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: eryksun, martin.panter, python-dev, serhiy.storchaka, svenberkvens
Priority: normal Keywords: patch

Created on 2015-11-20 08:40 by svenberkvens, last changed 2015-11-20 19:58 by serhiy.storchaka. This issue is now closed.

Files
File name Uploaded Description Edit
issue25678.patch eryksun, 2015-11-20 11:24 review
issue25678_2.patch eryksun, 2015-11-20 17:26 review
issue25678_3.patch eryksun, 2015-11-20 18:38 review
Messages (11)
msg254958 - (view) Author: Sven Berkvens-Matthijsse (svenberkvens) Date: 2015-11-20 08:40
Calling int() or long() on a buffer() object in Python 2.7 does not do the right thing. The following code snippet:

buf = buffer("123test", 1, 2)
print buf
print int(buf)

does not do what I would expect (that it print "23" twice). Instead, it prints "23" once and then throws an exception:

ValueError: invalid literal for int() with base 10: '23test'

This is caused by Objects/abstract.c function int_from_string(), which gets passed the length of the string but does not actually use that information to limit what part is parsed from the string. It only uses it to check for embedded NUL bytes. The real culprit is probably PyInt_FromString() which does not take a length indicator.
msg254964 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-11-20 10:45
This is similar to issue24802. The patch for issue24802 was not backported to 2.7 because affected functions don't accept memoryview in Python 2. Now we have an example, and can backport that patch.
msg254967 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2015-11-20 11:11
> Now we have an example, and can backport that patch.

More seriously it's possible to get a buffer over-read using NumPy:

    >>> import numpy
    >>> int(buffer(numpy.array('123', dtype='c')))
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
     ValueError: invalid literal for int() with base 10: '123\xe1\x18\x7f'

I backported the modification to PyNumber_Int and PyNumber_Long, using PyString_FromStringAndSize and PyString_AS_STRING. It works as expected:

    Python 2.7.10+ (2.7:5d88c1d413b9+, Nov 20 2015, 04:58:55) 
    [GCC 4.8.4] on linux2
    Type "help", "copyright", "credits" or "license" for more 
    information.
    >>> int(buffer('123test', 1, 2))
    23
    [41951 refs]
    >>> long(buffer('123test', 1, 2))
    23L
    [41952 refs]
msg254968 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-11-20 11:19
Do you forgot to add a patch?
msg254969 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2015-11-20 11:24
I just made a quick modification to check that it works. I'm sure you could do the same. But here it is anyway.
msg254970 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-11-20 11:34
Could you backport full patch, with tests? And compile() is affected too:

>>> compile(buffer("123\0test", 1, 2), '', 'exec')
<code object <module> at 0xb70c5800, file "", line 1>
>>> compile(buffer("123test", 1, 2), '', 'exec')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: compile() expected string without null bytes
msg255006 - (view) Author: Sven Berkvens-Matthijsse (svenberkvens) Date: 2015-11-20 18:18
Eryk, could the tests in the path file that you posted regarding int() and float() be incorrect in the sense that buffer(...)[a:b] returns a str() and not another buffer() (and will thus always be NUL terminated and not exhibit the problem, whether or not the C source have been patch)? I think the tests should be using buffer(..., a, b) instead.

May I say that I am very, very impressed by the speed with which this has been picked up? Thank you all for your impressive work!
msg255008 - (view) Author: Sven Berkvens-Matthijsse (svenberkvens) Date: 2015-11-20 18:28
(Please excuse my horrible spelling in my last message, I'm apparently more tired than I care to admit)
msg255009 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2015-11-20 18:38
> I think the tests should be using buffer(..., a, b) instead.

Thanks, you're right. :)
msg255014 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-11-20 19:56
New changeset 3ef7d1af5195 by Serhiy Storchaka in branch '2.7':
Issue #25678: Copy buffer objects to null-terminated strings.
https://hg.python.org/cpython/rev/3ef7d1af5195
msg255016 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-11-20 19:58
Committed after fixing some details. Thank you Eryk Sun.
History
Date User Action Args
2015-11-20 19:58:57serhiy.storchakasetstatus: open -> closed
resolution: fixed
messages: + msg255016

stage: patch review -> resolved
2015-11-20 19:56:50python-devsetnosy: + python-dev
messages: + msg255014
2015-11-20 19:14:22serhiy.storchakasetassignee: serhiy.storchaka
stage: needs patch -> patch review
2015-11-20 18:38:46eryksunsetfiles: + issue25678_3.patch

messages: + msg255009
2015-11-20 18:28:00svenberkvenssetmessages: + msg255008
2015-11-20 18:18:01svenberkvenssetmessages: + msg255006
2015-11-20 17:26:29eryksunsetfiles: + issue25678_2.patch
2015-11-20 11:34:16serhiy.storchakasetmessages: + msg254970
2015-11-20 11:24:01eryksunsetfiles: + issue25678.patch
keywords: + patch
messages: + msg254969
2015-11-20 11:19:18serhiy.storchakasetmessages: + msg254968
2015-11-20 11:11:47eryksunsetnosy: + eryksun
messages: + msg254967
2015-11-20 10:45:32serhiy.storchakasetnosy: + serhiy.storchaka, martin.panter

messages: + msg254964
stage: needs patch
2015-11-20 08:40:40svenberkvenscreate