Issue1215928
Created on 2005-06-06 19:19 by tree, last changed 2005-08-25 13:11 by georg.brandl. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| bz2module-lfs-seek.diff | georg.brandl, 2005-06-10 11:45 | |||
| Messages (10) | |||
|---|---|---|---|
| msg25497 - (view) | Author: Tom Emerson (tree) | Date: 2005-06-06 19:19 | |
I have a 4 gigabyte bz2 compressed tarfile containing some 3.3
million documents. I have a script which opens this file with "r:bz2"
and is simply iterating over the contents using next(). With 2.4.1 I
still get an Overflow error (originally tried with 2.3.5 as packaged in
Mac OS 10.4.1):
Traceback (most recent call last):
File "extract_part.py", line 47, in ?
main(sys.argv)
File "extract_part.py", line 39, in main
pathnames = find_valid_paths(argv[1], 1024, count)
File "extract_part.py", line 13, in find_valid_paths
f = tf.next()
File "/usr/local/lib/python2.4/tarfile.py", line 1584, in next
self.fileobj.seek(self.offset)
OverflowError: long int too large to convert to int
|
|||
| msg25498 - (view) | Author: Lars Gustäbel (lars.gustaebel) * ![]() |
Date: 2005-06-07 13:23 | |
Logged In: YES user_id=642936 A quick look at the problem reveals that this is a bug in bz2.BZ2File. The seek() method does not allow position values >= 2GiB. |
|||
| msg25499 - (view) | Author: Georg Brandl (georg.brandl) * ![]() |
Date: 2005-06-09 20:31 | |
Logged In: YES user_id=1188172 Attaching a patch which mimics the behaviour of normal file objects. This should resolve the issue on platforms with large file support. |
|||
| msg25500 - (view) | Author: Georg Brandl (georg.brandl) * ![]() |
Date: 2005-06-10 11:45 | |
Logged In: YES user_id=1188172 Attaching corrected patch. |
|||
| msg25501 - (view) | Author: Raymond Hettinger (rhettinger) * ![]() |
Date: 2005-06-13 01:32 | |
Logged In: YES user_id=80475 Is there a way to write a test for this? Can it be done without a conditional compile? Is the problem one that occurs in other code outside of bz? |
|||
| msg25502 - (view) | Author: Georg Brandl (georg.brandl) * ![]() |
Date: 2005-06-18 21:26 | |
Logged In: YES user_id=1188172 I looked into this a bit further, and noticed the following: The modules bz2, cStringIO and mmap all use plain integers to represent file offsets given to or returned by seek(), tell() and truncate(). They should be corrected to use a 64-bit type when having large file support. fileobject.c defines an own type for that, Py_off_t, which should be shared among the other modules. Conditional compile is needed since different macros/functions must be used. |
|||
| msg25503 - (view) | Author: Raymond Hettinger (rhettinger) * ![]() |
Date: 2005-06-18 22:05 | |
Logged In: YES user_id=80475 Martin, please look at this when you get a chance. |
|||
| msg25504 - (view) | Author: Viktor Ferenczi (complex) | Date: 2005-06-20 23:44 | |
Logged In: YES user_id=142612 The bug has been reproduced with a 90Mbytes bz2 file containing more than 4Gbytes of fairly similar documents. I've diagnosed the same problem with large offsets. Thanks for the patch. Platform: WinXP Intel P4, Python 2.4.1 |
|||
| msg25505 - (view) | Author: Martin v. Löwis (loewis) * ![]() |
Date: 2005-08-25 11:24 | |
Logged In: YES user_id=21627 The patch is fine, please apply. As for generalising Py_off_t: there are some issues which I keep forgetting. fpos_t is not guaranteed to be an integral type, and indeed, on Linux, it is not. I'm not quite completely sure why this patch works; I think that on all platforms where fpos_t is not integral, off_t happens to be large enough. The only case where off_t is not large enough is (IIRC) Windows, where fpos_t can be used. So this is all somewhat muddy, and if this gets generalized, a more elaborate comment seems to be in order. |
|||
| msg25506 - (view) | Author: Georg Brandl (georg.brandl) * ![]() |
Date: 2005-08-25 13:11 | |
Logged In: YES user_id=1188172 I just realized that I accidentally committed the patch together with the fix for #1191043. Modules/bz2module r1.25, r1.23.2.2. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2005-06-06 19:19:18 | tree | create | |
