classification
Title: match_start truncates large values
Type: behavior Stage: resolved
Components: Library (Lib), Regular Expressions Versions: Python 3.4, Python 3.2, Python 3.3, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: benjamin.peterson Nosy List: barry, benjamin.peterson, ezio.melotti, georg.brandl, larry, loewis, mrabarnett, pitrou, python-dev, serhiy.storchaka
Priority: release blocker Keywords: easy, needs review, patch

Created on 2010-10-23 18:37 by loewis, last changed 2013-01-10 16:38 by benjamin.peterson. This issue is now closed.

Files
File name Uploaded Description Edit
buildvalue_overflow.patch serhiy.storchaka, 2012-10-20 22:17 review
buildvalue_overflow_tests.patch serhiy.storchaka, 2012-12-01 22:08 review
Messages (17)
msg119462 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2010-10-23 18:37
_sre.c:match_start currently uses Py_BuildValue("i") to return the start index, which itself is of type Py_ssize_t. This will get truncated on x64 Windows if the start is > 2**31. PyInt_FromSsize_t should be used instead. Other usages of Py_BuildValue should be reviewed as well.
msg173421 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-10-20 22:17
I analyzed all 1-valued usages of Py_BuildValue and found similar bags only in Modules/_sre.h. Here is a patch.

Bugs should be evident on big-endian platform with sizeof(int) < sizeof(size_t) or sizeof(long) < sizeof(size_t). Standard tests should fail on such platforms.
msg174428 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-11-01 16:27
Please review.
msg175818 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-11-17 22:37
Patch looks good to me. Do you think it's possible to add some bigmem tests for this?
msg175822 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-11-17 22:51
I can't write a test, because I have no such big memory.
msg175853 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-11-18 09:47
Actually, standard tests should be enough on a big-endian platform with sizeof(int) < sizeof(size_t) or sizeof(long) < sizeof(size_t).  No additional tests needed.
msg176312 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-11-24 20:00
> Actually, standard tests should be enough on a big-endian platform with 
> sizeof(int) < sizeof(size_t) or sizeof(long) < sizeof(size_t).  No
> additional tests needed.

Ok, thanks. Unfortunately, all our 64-bit big endian buildbots are configured to produce 32-bit Pythons...
msg176763 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-12-01 21:49
Actually, we now have a 64-bit big endian buildbot and it does not show any test failure:
http://buildbot.python.org/all/builders/SPARC%20Solaris%2010%20%28cc%2C%2064b%29%20%5BSB%5D%203.x

[...]
checking size of int... 4
checking size of long... 8
checking size of void *... 8
checking size of short... 2
checking size of float... 4
checking size of double... 8
checking size of fpos_t... 8
checking size of size_t... 8
[...]
checking whether byte ordering is bigendian... yes
msg176766 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-12-01 22:08
Here is a test.
msg176782 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-12-02 11:12
Thanks for the patch. For the record, the test passes successfully but it needs 40 GB, not 4 GB:
http://buildbot.python.org/all/builders/AMD64%20Ubuntu%20LTS%20bigmem%20custom/builds/3
msg176786 - (view) Author: Roundup Robot (python-dev) Date: 2012-12-02 11:56
New changeset 0edc92d78192 by Antoine Pitrou in branch '3.2':
Issue #10182: The re module doesn't truncate indices to 32 bits anymore.
http://hg.python.org/cpython/rev/0edc92d78192

New changeset 21ceee08a375 by Antoine Pitrou in branch '3.3':
Issue #10182: The re module doesn't truncate indices to 32 bits anymore.
http://hg.python.org/cpython/rev/21ceee08a375

New changeset e7104cc09d02 by Antoine Pitrou in branch 'default':
Issue #10182: The re module doesn't truncate indices to 32 bits anymore.
http://hg.python.org/cpython/rev/e7104cc09d02
msg176787 - (view) Author: Roundup Robot (python-dev) Date: 2012-12-02 12:01
New changeset de0f38f9280e by Antoine Pitrou in branch '2.7':
Issue #10182: The re module doesn't truncate indices to 32 bits anymore.
http://hg.python.org/cpython/rev/de0f38f9280e
msg179561 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2013-01-10 16:21
Note that this change is causing problems with genshi due to API backward incompatibility.  This is reported here:

https://bugs.launchpad.net/ubuntu/+source/python2.7/+bug/1097783

The reason the user found it in Ubuntu first is that we track hg tip, but I've reproduced it in my own hg tip, and the diff of r80680 makes it pretty obvious.

I'm not saying the change should necessarily be reverted, but it *is* a backward incompatibility, and at a minimum the NEWS file (and release notes for 2.7.4) should make it clear that the return types have changed.  As we're seeing, it breaks at least one existing package.
msg179562 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2013-01-10 16:28
Nosied Benjamin since this is a release issue.  Re-opened, assigned to him, and release blocked for 2.7.4.
msg179567 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2013-01-10 16:33
Actually, 2.7 should just use PyInt_FromSsize_t, which only promotes when needed.
msg179569 - (view) Author: Roundup Robot (python-dev) Date: 2013-01-10 16:37
New changeset 0f5067d9e1d8 by Benjamin Peterson in branch '2.7':
use PyInt_FromSsize_t instead of PyLong_FromSsize_t (#10182)
http://hg.python.org/cpython/rev/0f5067d9e1d8
msg179570 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2013-01-10 16:38
It's good you were able to report this before we released anything.
History
Date User Action Args
2013-01-10 16:38:37benjamin.petersonsetstatus: open -> closed

messages: + msg179570
2013-01-10 16:38:00python-devsetmessages: + msg179569
2013-01-10 16:33:39benjamin.petersonsetmessages: + msg179567
2013-01-10 16:28:23barrysetstatus: closed -> open
priority: normal -> release blocker

nosy: + georg.brandl, larry
messages: + msg179562

assignee: benjamin.peterson
2013-01-10 16:26:40barrysetnosy: + benjamin.peterson
2013-01-10 16:21:47barrysetnosy: + barry
messages: + msg179561
2012-12-02 12:01:39pitrousetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2012-12-02 12:01:08python-devsetmessages: + msg176787
2012-12-02 11:56:36python-devsetnosy: + python-dev
messages: + msg176786
2012-12-02 11:12:34pitrousetmessages: + msg176782
2012-12-01 22:08:53serhiy.storchakasetfiles: + buildvalue_overflow_tests.patch

messages: + msg176766
2012-12-01 21:49:07pitrousetmessages: + msg176763
2012-12-01 21:38:29pitroulinkissue16586 superseder
2012-11-24 20:00:18pitrousetmessages: + msg176312
2012-11-18 09:47:03serhiy.storchakasetmessages: + msg175853
2012-11-17 22:51:29serhiy.storchakasetmessages: + msg175822
2012-11-17 22:37:41pitrousetmessages: + msg175818
2012-11-04 04:32:01ezio.melottisetnosy: + pitrou
2012-11-01 16:27:01serhiy.storchakasetkeywords: + needs review

messages: + msg174428
2012-10-24 08:54:36serhiy.storchakasetstage: needs patch -> patch review
2012-10-20 22:17:53serhiy.storchakasetnosy: + ezio.melotti, mrabarnett
components: + Regular Expressions
2012-10-20 22:17:25serhiy.storchakasetfiles: + buildvalue_overflow.patch
keywords: + patch
messages: + msg173421

versions: + Python 3.4
2012-08-06 11:17:23serhiy.storchakasetkeywords: + easy
2012-08-06 11:13:27serhiy.storchakasettype: behavior
components: + Library (Lib)
stage: needs patch
2012-08-06 11:11:12serhiy.storchakasetnosy: + serhiy.storchaka

versions: + Python 3.3, - Python 3.1
2010-10-23 18:37:50loewiscreate