classification
Title: file iteration SystemError for huge lines (2GiB+)
Type: behavior Stage: resolved
Components: IO Versions: Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Arfrever, abukaj, benjamin.peterson, doko, pitrou, pitti, python-dev, serhiy.storchaka, vstinner
Priority: normal Keywords: patch

Created on 2014-09-30 16:10 by abukaj, last changed 2014-10-12 15:24 by benjamin.peterson. This issue is now closed.

Files
File name Uploaded Description Edit
issue22526_test.patch serhiy.storchaka, 2014-10-04 07:43 review
Messages (13)
msg227949 - (view) Author: Jakub Mateusz Dzik (abukaj) Date: 2014-09-30 16:10
File /tmp/2147483648zeros is 2^31 (2GiB) zero-bytes ('\0').

Readline method works fine:
>>> fh = open('/tmp/2147483648zeros', 'rb')
>>> line = fh.readline()
>>> len(line)
2147483648

However when I try to iterate over the file:
>>> fh = open('/tmp/2147483648zeros', 'rb')
>>> for line in fh:
...   print len(line)

SystemError                         Traceback (most recent call last)
/home/jkowalski/<ipython-input-55-aaa9ddb42aea> in <module>()
----> 1 for line in fh:
      2     print len(line)
      3 
SystemError: Negative size passed to PyString_FromStringAndSize


Same is for greater files (issue discovered for 2243973120 B).
For a shorter file iteration works as expected.


File /tmp/2147483647zeros is 2^31 - 1 (< 2GiB) zero-bytes.
>>> fh = open('/tmp/2147483647zeros', 'rb')
>>> for line in fh:
...   print len(line)
2147483647


I guess the variable used for size is of 32bit signed type.

I am using Python 2.7.3 (default, Feb 27 2014, 19:58:35) with IPython 0.12.1 on Ubuntu 12.04.5 LTS.
msg228050 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2014-10-01 01:17
New changeset beadb3e1dc81 by Benjamin Peterson in branch '2.7':
use Py_ssize_t for file offset and length computations in iteration (closes #22526)
https://hg.python.org/cpython/rev/beadb3e1dc81
msg228366 - (view) Author: Matthias Klose (doko) * (Python committer) Date: 2014-10-03 19:42
no, it doesn't. at least when testing the installed python installation, it just fails:

https://jenkins.qa.ubuntu.com/job/utopic-adt-python2.7/39/?
msg228369 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2014-10-03 20:24
For insufficient memory not an incorrect fix, though?
msg228383 - (view) Author: Matthias Klose (doko) * (Python committer) Date: 2014-10-03 21:17
maybe, but then you should skip the test, or expect at least a MemoryError.
msg228384 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2014-10-03 21:18
How much memory does that whatever is running that test have?

On Fri, Oct 3, 2014, at 17:17, Matthias Klose wrote:
> 
> Matthias Klose added the comment:
> 
> maybe, but then you should skip the test, or expect at least a
> MemoryError.
> 
> ----------
> 
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue22526>
> _______________________________________
msg228434 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-10-04 07:43
Test should be marked with dry_run=False, otherwise it is false passed. Allocation of growing buffer needs extra memory, in my experiments memuse=2.5 is enough. And I think this test should require the largefile resource. Here is a patch. It also significantly speeds up a test on Linux.
msg228642 - (view) Author: Martin Pitt (pitti) Date: 2014-10-06 05:50
> How much memory does that whatever is running that test have?

Our default is 1 GB for our test runner VMs. I now raised it to 4 GB for python2.7, but we can only do that for our x86 VMs. For other architectures (ppc64el and ARM) the test VMs just don't have that much memory. So indeed it would be nice to skip this test if the machine has less than 4 GB of RAM.

Thanks!
msg228648 - (view) Author: Martin Pitt (pitti) Date: 2014-10-06 08:41
> I now raised it to 4 GB for python2.7

This is *still* not enough; I got a success with 6 GB. But this is really demanding..
msg228666 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-10-06 12:04
On Fedora 20/x86_64, running test_file2k takes up to 4.8 GB (5114316 kB) of RSS memory (VmPeak in /proc/pid/status).

It looks like readahead_get_line_skip() has an efficient code to handle buffer. It uses recursive calls:
---
readahead: allocate 0.0 MB
readahead: allocate 0.0 MB
readahead: allocate 0.0 MB
readahead: allocate 0.0 MB
readahead: allocate 0.0 MB
readahead: allocate 0.0 MB
readahead: allocate 0.0 MB
readahead: allocate 0.0 MB
readahead: allocate 0.0 MB
readahead: allocate 0.1 MB
readahead: allocate 0.1 MB
readahead: allocate 0.1 MB
readahead: allocate 0.1 MB
readahead: allocate 0.1 MB
readahead: allocate 0.2 MB
readahead: allocate 0.2 MB
readahead: allocate 0.3 MB
readahead: allocate 0.3 MB
readahead: allocate 0.4 MB
readahead: allocate 0.5 MB
readahead: allocate 0.7 MB
readahead: allocate 0.8 MB
readahead: allocate 1.1 MB
readahead: allocate 1.3 MB
readahead: allocate 1.7 MB
readahead: allocate 2.1 MB
readahead: allocate 2.6 MB
readahead: allocate 3.2 MB
readahead: allocate 4.0 MB
readahead: allocate 5.0 MB
readahead: allocate 6.3 MB
readahead: allocate 7.9 MB
readahead: allocate 9.9 MB
readahead: allocate 12.3 MB
readahead: allocate 15.4 MB
readahead: allocate 19.3 MB
readahead: allocate 24.1 MB
readahead: allocate 30.1 MB
readahead: allocate 37.6 MB
readahead: allocate 47.0 MB
readahead: allocate 58.8 MB
readahead: allocate 73.5 MB
readahead: allocate 91.8 MB
readahead: allocate 114.8 MB
readahead: allocate 143.5 MB
readahead: allocate 179.4 MB
readahead: allocate 224.2 MB
readahead: allocate 280.2 MB
readahead: allocate 350.3 MB
readahead: allocate 437.9 MB
readahead: allocate 547.3 MB

Breakpoint 2, PyObject_Malloc (nbytes=2147483733) at Objects/obmalloc.c:792
792	    if (nbytes > PY_SSIZE_T_MAX)
(gdb) where
#0  PyObject_Malloc (nbytes=2147483733) at Objects/obmalloc.c:792
#1  0x0000000000464af1 in _PyObject_DebugMallocApi (id=111 'o', nbytes=2147483701) at Objects/obmalloc.c:1474
#2  0x0000000000464a48 in _PyObject_DebugMalloc (nbytes=2147483701) at Objects/obmalloc.c:1441
#3  0x000000000046efdc in PyString_FromStringAndSize (str=0x0, size=2147483648) at Objects/stringobject.c:88
#4  0x0000000000436c30 in readahead_get_line_skip (f=0x7fffeea2bf40, skip=2147483648, bufsize=573933340) at Objects/fileobject.c:2291
#5  0x0000000000436da6 in readahead_get_line_skip (f=0x7fffeea2bf40, skip=1836553986, bufsize=459146672) at Objects/fileobject.c:2311
#6  0x0000000000436da6 in readahead_get_line_skip (f=0x7fffeea2bf40, skip=1469236648, bufsize=367317338) at Objects/fileobject.c:2311
#7  0x0000000000436da6 in readahead_get_line_skip (f=0x7fffeea2bf40, skip=1175382777, bufsize=293853871) at Objects/fileobject.c:2311
#8  0x0000000000436da6 in readahead_get_line_skip (f=0x7fffeea2bf40, skip=940299680, bufsize=235083097) at Objects/fileobject.c:2311
#9  0x0000000000436da6 in readahead_get_line_skip (f=0x7fffeea2bf40, skip=752233202, bufsize=188066478) at Objects/fileobject.c:2311
#10 0x0000000000436da6 in readahead_get_line_skip (f=0x7fffeea2bf40, skip=601780019, bufsize=150453183) at Objects/fileobject.c:2311
#11 0x0000000000436da6 in readahead_get_line_skip (f=0x7fffeea2bf40, skip=481417472, bufsize=120362547) at Objects/fileobject.c:2311
#12 0x0000000000436da6 in readahead_get_line_skip (f=0x7fffeea2bf40, skip=385127434, bufsize=96290038) at Objects/fileobject.c:2311
#13 0x0000000000436da6 in readahead_get_line_skip (f=0x7fffeea2bf40, skip=308095403, bufsize=77032031) at Objects/fileobject.c:2311
#14 0x0000000000436da6 in readahead_get_line_skip (f=0x7fffeea2bf40, skip=246469778, bufsize=61625625) at Objects/fileobject.c:2311
#15 0x0000000000436da6 in readahead_get_line_skip (f=0x7fffeea2bf40, skip=197169278, bufsize=49300500) at Objects/fileobject.c:2311
#16 0x0000000000436da6 in readahead_get_line_skip (f=0x7fffeea2bf40, skip=157728878, bufsize=39440400) at Objects/fileobject.c:2311
#17 0x0000000000436da6 in readahead_get_line_skip (f=0x7fffeea2bf40, skip=126176558, bufsize=31552320) at Objects/fileobject.c:2311
#18 0x0000000000436da6 in readahead_get_line_skip (f=0x7fffeea2bf40, skip=100934702, bufsize=25241856) at Objects/fileobject.c:2311
#19 0x0000000000436da6 in readahead_get_line_skip (f=0x7fffeea2bf40, skip=80741217, bufsize=20193485) at Objects/fileobject.c:2311
#20 0x0000000000436da6 in readahead_get_line_skip (f=0x7fffeea2bf40, skip=64586429, bufsize=16154788) at Objects/fileobject.c:2311
#21 0x0000000000436da6 in readahead_get_line_skip (f=0x7fffeea2bf40, skip=51662598, bufsize=12923831) at Objects/fileobject.c:2311
#22 0x0000000000436da6 in readahead_get_line_skip (f=0x7fffeea2bf40, skip=41323533, bufsize=10339065) at Objects/fileobject.c:2311
#23 0x0000000000436da6 in readahead_get_line_skip (f=0x7fffeea2bf40, skip=33052281, bufsize=8271252) at Objects/fileobject.c:2311
#24 0x0000000000436da6 in readahead_get_line_skip (f=0x7fffeea2bf40, skip=26435279, bufsize=6617002) at Objects/fileobject.c:2311
#25 0x0000000000436da6 in readahead_get_line_skip (f=0x7fffeea2bf40, skip=21141677, bufsize=5293602) at Objects/fileobject.c:2311
#26 0x0000000000436da6 in readahead_get_line_skip (f=0x7fffeea2bf40, skip=16906795, bufsize=4234882) at Objects/fileobject.c:2311
#27 0x0000000000436da6 in readahead_get_line_skip (f=0x7fffeea2bf40, skip=13518889, bufsize=3387906) at Objects/fileobject.c:2311
#28 0x0000000000436da6 in readahead_get_line_skip (f=0x7fffeea2bf40, skip=10808564, bufsize=2710325) at Objects/fileobject.c:2311
#29 0x0000000000436da6 in readahead_get_line_skip (f=0x7fffeea2bf40, skip=8640304, bufsize=2168260) at Objects/fileobject.c:2311
#30 0x0000000000436da6 in readahead_get_line_skip (f=0x7fffeea2bf40, skip=6905696, bufsize=1734608) at Objects/fileobject.c:2311
#31 0x0000000000436da6 in readahead_get_line_skip (f=0x7fffeea2bf40, skip=5518009, bufsize=1387687) at Objects/fileobject.c:2311
#32 0x0000000000436da6 in readahead_get_line_skip (f=0x7fffeea2bf40, skip=4407859, bufsize=1110150) at Objects/fileobject.c:2311
#33 0x0000000000436da6 in readahead_get_line_skip (f=0x7fffeea2bf40, skip=3519739, bufsize=888120) at Objects/fileobject.c:2311
#34 0x0000000000436da6 in readahead_get_line_skip (f=0x7fffeea2bf40, skip=2809243, bufsize=710496) at Objects/fileobject.c:2311
#35 0x0000000000436da6 in readahead_get_line_skip (f=0x7fffeea2bf40, skip=2240846, bufsize=568397) at Objects/fileobject.c:2311
#36 0x0000000000436da6 in readahead_get_line_skip (f=0x7fffeea2bf40, skip=1786128, bufsize=454718) at Objects/fileobject.c:2311
#37 0x0000000000436da6 in readahead_get_line_skip (f=0x7fffeea2bf40, skip=1422353, bufsize=363775) at Objects/fileobject.c:2311
#38 0x0000000000436da6 in readahead_get_line_skip (f=0x7fffeea2bf40, skip=1131333, bufsize=291020) at Objects/fileobject.c:2311
#39 0x0000000000436da6 in readahead_get_line_skip (f=0x7fffeea2bf40, skip=898517, bufsize=232816) at Objects/fileobject.c:2311
#40 0x0000000000436da6 in readahead_get_line_skip (f=0x7fffeea2bf40, skip=712264, bufsize=186253) at Objects/fileobject.c:2311
#41 0x0000000000436da6 in readahead_get_line_skip (f=0x7fffeea2bf40, skip=563261, bufsize=149003) at Objects/fileobject.c:2311
#42 0x0000000000436da6 in readahead_get_line_skip (f=0x7fffeea2bf40, skip=444058, bufsize=119203) at Objects/fileobject.c:2311
#43 0x0000000000436da6 in readahead_get_line_skip (f=0x7fffeea2bf40, skip=348695, bufsize=95363) at Objects/fileobject.c:2311
#44 0x0000000000436da6 in readahead_get_line_skip (f=0x7fffeea2bf40, skip=272404, bufsize=76291) at Objects/fileobject.c:2311
#45 0x0000000000436da6 in readahead_get_line_skip (f=0x7fffeea2bf40, skip=211371, bufsize=61033) at Objects/fileobject.c:2311
#46 0x0000000000436da6 in readahead_get_line_skip (f=0x7fffeea2bf40, skip=162544, bufsize=48827) at Objects/fileobject.c:2311
#47 0x0000000000436da6 in readahead_get_line_skip (f=0x7fffeea2bf40, skip=123482, bufsize=39062) at Objects/fileobject.c:2311
#48 0x0000000000436da6 in readahead_get_line_skip (f=0x7fffeea2bf40, skip=92232, bufsize=31250) at Objects/fileobject.c:2311
#49 0x0000000000436da6 in readahead_get_line_skip (f=0x7fffeea2bf40, skip=67232, bufsize=25000) at Objects/fileobject.c:2311
#50 0x0000000000436da6 in readahead_get_line_skip (f=0x7fffeea2bf40, skip=47232, bufsize=20000) at Objects/fileobject.c:2311
#51 0x0000000000436da6 in readahead_get_line_skip (f=0x7fffeea2bf40, skip=31232, bufsize=16000) at Objects/fileobject.c:2311
#52 0x0000000000436da6 in readahead_get_line_skip (f=0x7fffeea2bf40, skip=18432, bufsize=12800) at Objects/fileobject.c:2311
#53 0x0000000000436da6 in readahead_get_line_skip (f=0x7fffeea2bf40, skip=8192, bufsize=10240) at Objects/fileobject.c:2311
#54 0x0000000000436da6 in readahead_get_line_skip (f=0x7fffeea2bf40, skip=0, bufsize=8192) at Objects/fileobject.c:2311
#55 0x0000000000436e52 in file_iternext (f=0x7fffeea2bf40) at Objects/fileobject.c:2335
---
msg228689 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-10-06 14:18
Serhiy's patch looks ok to me (haven't tested it).
msg228806 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-10-08 18:14
Could anyone please test it on Windows?
msg229146 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2014-10-12 14:17
New changeset be600ea4ad13 by Serhiy Storchaka in branch '2.7':
Fixed and optimized a test of issue #22526.
https://hg.python.org/cpython/rev/be600ea4ad13
History
Date User Action Args
2014-10-12 15:24:01benjamin.petersonsetstatus: open -> closed
resolution: fixed
2014-10-12 14:17:19python-devsetmessages: + msg229146
2014-10-08 18:14:37serhiy.storchakasetmessages: + msg228806
2014-10-06 14:18:33pitrousetnosy: + pitrou
messages: + msg228689
2014-10-06 12:04:18vstinnersetmessages: + msg228666
2014-10-06 08:41:10pittisetmessages: + msg228648
2014-10-06 05:50:48pittisetnosy: + pitti
messages: + msg228642
2014-10-05 06:23:00Arfreversetnosy: + Arfrever
2014-10-04 07:43:15serhiy.storchakasetfiles: + issue22526_test.patch

nosy: + serhiy.storchaka
messages: + msg228434

keywords: + patch
2014-10-03 21:18:27benjamin.petersonsetmessages: + msg228384
2014-10-03 21:17:47dokosetmessages: + msg228383
2014-10-03 20:24:05benjamin.petersonsetnosy: + benjamin.peterson
messages: + msg228369
2014-10-03 19:42:27dokosetstatus: closed -> open

nosy: + doko
messages: + msg228366

resolution: fixed -> (no value)
2014-10-01 01:17:36python-devsetstatus: open -> closed

nosy: + python-dev
messages: + msg228050

resolution: fixed
stage: resolved
2014-09-30 19:35:13r.david.murraysettype: crash -> behavior
title: file iteration crashes for huge lines (2GiB+) -> file iteration SystemError for huge lines (2GiB+)
2014-09-30 16:12:11vstinnersetnosy: + vstinner
2014-09-30 16:10:51abukajcreate