Issue6939
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2009-09-18 13:21 by pakal, last changed 2022-04-11 14:56 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
PY27_truncate.patch | pakal, 2010-01-28 22:26 |
Messages (27) | |||
---|---|---|---|
msg92822 - (view) | Author: Pascal Chambon (pakal) * | Date: 2009-09-18 13:21 | |
Hello I'm having trouble around the behaviour of the io module's truncate methods, in py3k. If I remember well, under unix and older versions of Python (with other file types), truncate never move the fiel pointer (and had to fake that behaviour under windows as well, by saving/restoring the file pointer). However, I see nothing in the documentation of the io module about that, and studying the _fileio.c code, it seems that now the file pointer is (under all OS) moved to the truncation point, and left there. Maybe it requires discusions on the mailing list, but personally I think that the common unix behaviour (letting the file pointer untouched) would be better for everyone, and that the doc should claim it. Also, additional notes about the behaviour of that truncation (reminding people that on some OS, the padding is zero-filled, on others it's not...) would be great. Current doc of io in py3k : http://docs.python.org/py3k/library/io.html#io.IOBase.truncate truncate(size=None) Truncate the file to at most size bytes. size defaults to the current file position, as returned by tell(). Cheers Pascal |
|||
msg92827 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2009-09-18 14:58 | |
I can't tell you why it was decided to behave like this. Perhaps it was felt that, since FileIO is low-level (lower than the file object in 2.x), it shouldn't have to remember and restore the old file position. That argument wouldn't apply to BufferedWriter, though. I'm afraid I don't know anything about the padding differences myself. If you give us more precisions on the subject, we will be able to include them in the documentation. |
|||
msg92833 - (view) | Author: Pascal Chambon (pakal) * | Date: 2009-09-18 17:15 | |
Well, I guess it deserve discussion on the pydev mailing lits, that's imo a rather important point, to be documented precisely. Concerning the padding, I guess the semantic doesn't change between the io module and the old file type, i.e : file.truncate([size])¶ Truncate the file’s size. If the optional size argument is present, the file is truncated to (at most) that size. The size defaults to the current position. The current file position is not changed. Note that if a specified size exceeds the file’s current size, the result is platform-dependent: possibilities include that the file may remain unchanged, increase to the specified size as if zero-filled, or increase to the specified size with undefined new content. Availability: Windows, many Unix variants. Having platform-dependent semantics for a so important type is anyway dangerous, I feel. The io module should be rock-solid, with very deeply documented semantics, and implementations that force platforms to conform to these specifications. Eg. I've noticed that all io.FileIO methods raised IOErrors in case of problem, except in one case (wrapping a bad file decsriptor) where it's OSError - that little thing might make big programs crash in an ugly way, if they only caught IOError. I'm currently working on a cross-platform (linux, mac, windows at least) FileIO type, with very strict semantics and extended abilities (file range shared and exclusive locking, with timeouts) ; even though it'll initially be slower than _fileio (most of it is in pure python), maybe it might solve most of these io module problems. I'll keep people informed. |
|||
msg97273 - (view) | Author: Pascal Chambon (pakal) * | Date: 2010-01-05 19:27 | |
Hello I'm currently finalizing the API of my raw io file implementation, but I still have trouble with the trunk implementation of IOBase.truncate(). If I remember well, in the mailing list topic on this subject, GvR noted that this change of behaviour compared to python 2.x was not intended, and that it would be better to get back to the expected behaviour - not touching the file pointer - and to document the method in this way. Are there new elements, advocating a status quo on this matter ? Or shouldn't we add the portable_lseek() call in fileio.c to fix that ? On a separate note, I'm confused about the "at most" phrase in the current documentation : --- truncate(size=None) Truncate the file to at most size bytes. size defaults to the current file position, as returned by tell() --- According to what I've read so far, a succesful truncate() call will always extend/reduce the file until teh desired size, isn't that so on all platforms ? Regards Pascal |
|||
msg97274 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2010-01-05 19:36 | |
> Are there new elements, advocating a status quo on this matter ? Nothing, it's just lacking a patch from someone interested in the matter. > On a separate note, I'm confused about the "at most" phrase in the > current documentation : > --- > truncate(size=None) > Truncate the file to at most size bytes. size defaults to the > current file position, as returned by tell() > --- > According to what I've read so far, a succesful truncate() call will > always extend/reduce the file until teh desired size, isn't that so on > all platforms ? I suppose it was worded that way because some platforms may not support extending a file when a parameter larger than the current size is given. I don't know if such systems are widespread. |
|||
msg97277 - (view) | Author: Pascal Chambon (pakal) * | Date: 2010-01-05 19:54 | |
Allright, I'll try to work on it as soon as I manage to gather a decent compilation environment on windows... |
|||
msg97321 - (view) | Author: Pascal Chambon (pakal) * | Date: 2010-01-06 20:18 | |
Hi Here is a patch for the python2.6 _fileio.c module, as well as the corresponding testcase. I'll check the _pyio and C _io trunk modules asap. |
|||
msg97326 - (view) | Author: Pascal Chambon (pakal) * | Date: 2010-01-06 21:02 | |
Whoups - I forgot to bugfix as well the _bytesio classes... let's just forget about the previous patch. |
|||
msg97328 - (view) | Author: Pascal Chambon (pakal) * | Date: 2010-01-06 21:36 | |
Here is the new patch for py2.6, fixing (hopefully) all the truncate() methods. Note that the python _BytesIO implementation isn't covered by the test suite, as it's shadowed by the C implementation ; but imo we shouldn't care : I've manually tested it once, and anyway the trunk sources don't have this problem. |
|||
msg97329 - (view) | Author: Pascal Chambon (pakal) * | Date: 2010-01-06 21:38 | |
And here is the python trunk patch, covering _Pyio and _io modules (+ corresponding tests). |
|||
msg97834 - (view) | Author: Pascal Chambon (pakal) * | Date: 2010-01-15 19:01 | |
Is there anything I can do to help this patch making its way to the trunk ? I guess it'd be better if Python2.7 benefited from it, so that users don't run anymore the risk of relying of this undocumented and non-canonical "truncate" behaviour. Regards, Pascal |
|||
msg97836 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2010-01-15 19:17 | |
Looking at the patches: - if you want to remove code, please remove it; don't comment it out - please don't use C++-style comments (it will break on some compilers, produce warnings on others) - code lines should generally not be wider than 80 chars (although this is not really a hard rule) - I don't understand the purpose of "self.seek(0, os.SEEK_CUR)"; shouldn't it be a no-op? If not, please add a comment to explain why. - you are not decrefinf objects properly when you are done with them (posobj, oldposobj), which will certainly produce memory leaks - some tests seem to be missing from the 2.7 patch, compared to the 2.6 one - if you want to test BytesIO and StringIO, tests can be added to Lib/test/test_memoryio.py |
|||
msg97849 - (view) | Author: Pascal Chambon (pakal) * | Date: 2010-01-15 21:21 | |
Allright, I shall fix all this asap. But it seems the C code for truncate is actually buggy in the current 2.6 _fileio.c, around line 680. CF code below : posobj = portable_lseek(fd, posobj, 0); -> don't we lose the reference to the old "posobj" there, doing a memory leak ? if (PyErr_Occurred()) return NULL; -> same thing, we return Null without caring about the posobj reference which should be non-Null there ?? If I've understood a little reference counting, "portable_lseek" returns a reference that we own and must Py_DECREF, isn't that so ? -------- if (posobj == Py_None || posobj == NULL) { /* Get the current position. */ posobj = portable_lseek(fd, NULL, 1); if (posobj == NULL) return NULL; } else { /* Move to the position to be truncated. */ posobj = portable_lseek(fd, posobj, 0); } #if defined(HAVE_LARGEFILE_SUPPORT) pos = PyLong_AsLongLong(posobj); #else pos = PyLong_AsLong(posobj); #endif if (PyErr_Occurred()) return NULL; ------ |
|||
msg97853 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2010-01-15 22:15 | |
> posobj = portable_lseek(fd, posobj, 0); -> don't we lose the reference > to the old "posobj" there, doing a memory leak ? It's a bit more subtle. Here, the first reference to "posobj" that you get is through the function arguments. You don't own that reference, so must not decref it when you are done with it. Moreover, if you return that very same reference, you should incref it first (which your patch doesn't do when posobj is non-None, and therefore loses a reference, see below). However, when you get a new "posobj" from portable_lseek() (or most other C API functions), you own this new reference and therefore must decref it when you are done with it. To sum it up and if I'm not mistaken, you must: - add a Py_INCREF(posobj) when posobj is non-NULL and non-None. - Py_DECREF the first posobj, as well as oldposobj, in the Windows-specific path. If you wanna witness reference counting behaviour, you should build Python in debug mode (--with-pydebug if under Linux). Right now under Linux your patch produces the following behaviour: [39542 refs] >>> f.truncate() 0L [39547 refs] >>> f.truncate() 0L [39547 refs] >>> f.truncate(2) 2 [39545 refs] >>> f.truncate(2) 2 [39544 refs] As you see, when posobj is non-None, we actually lose a reference because a Py_INCREF is missing. If you do this too often the small integer object "2" gets wrongly destroyed and the interpreter crashes: ... [snip] Erreur de segmentation |
|||
msg97854 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2010-01-15 22:17 | |
He, roundup ate part of the code I pasted. Here it is again: >>> import io [39516 refs] >>> f = io.open("foo", "wb", buffering=0) [39542 refs] >>> f.truncate() 0L [39544 refs] >>> f.truncate() 0L [39544 refs] >>> f.truncate(2) 2 [39543 refs] >>> f.truncate(2) 2 [39542 refs] >>> while True: f.truncate(2) ... [snip] Erreur de segmentation |
|||
msg97855 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2010-01-15 22:19 | |
By the way, you witness the issue (less clearly though) by running the tests in debug mode, too: $ ./python -m test.regrtest -v test_io test_io test_BufferedIOBase_destructor (test.test_io.CIOTest) ... ok test_IOBase_destructor (test.test_io.CIOTest) ... ok test_RawIOBase_destructor (test.test_io.CIOTest) ... ok test_TextIOBase_destructor (test.test_io.CIOTest) ... ok test_append_mode_tell (test.test_io.CIOTest) ... ok test_array_writes (test.test_io.CIOTest) ... ok test_buffered_file_io (test.test_io.CIOTest) ... ok test_close_flushes (test.test_io.CIOTest) ... ok test_closefd (test.test_io.CIOTest) ... ok test_closefd_attr (test.test_io.CIOTest) ... ok test_destructor (test.test_io.CIOTest) ... ok test_garbage_collection (test.test_io.CIOTest) ... ok test_invalid_operations (test.test_io.CIOTest) ... ok test_large_file_ops (test.test_io.CIOTest) ... Fatal Python error: Python/ceval.c:4058 object at 0x2572448 has negative ref count -1 Abandon |
|||
msg97873 - (view) | Author: Pascal Chambon (pakal) * | Date: 2010-01-16 12:09 | |
Thanks for the detailed feedback. According to what you said (and to details found in python docs), the current _fileio.truncate is actually quite buggy -> no reference counting ops were done on posobj, even though it's sometimes retrieved from _portable_seek, and returned or not depending on rare error cases (SetEndOfFile failing etc.). Here is a patch fixing all that for python2.6, I've tested it manually in debug mode, and against memoryio/io/fileio tests ; hope it will be OK. As soon as this one is certified, I'll prepare a patch for python2.7 |
|||
msg97911 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2010-01-16 21:48 | |
First, I get the following compilation warnings under Linux: /home/antoine/cpython/26/Modules/_fileio.c: In function ‘fileio_truncate’: /home/antoine/cpython/26/Modules/_fileio.c:651: attention : unused variable ‘tempposobj’ /home/antoine/cpython/26/Modules/_fileio.c:650: attention : unused variable ‘oldposobj’ The two variables should be declared at the beginning of the Windows-specific block instead. Second, there's a test failure in test_io: ====================================================================== FAIL: test_large_file_ops (test.test_io.IOTest) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/antoine/cpython/26/Lib/test/test_io.py", line 221, in test_large_file_ops self.large_file_ops(f) File "/home/antoine/cpython/26/Lib/test/test_io.py", line 154, in large_file_ops self.assertEqual(f.tell(), self.LARGE + 1) AssertionError: 2147483650L != 2147483649 ---------------------------------------------------------------------- |
|||
msg97966 - (view) | Author: Pascal Chambon (pakal) * | Date: 2010-01-17 19:13 | |
Allright - sorry for the failure - I've cleaned my hdd enough to launch large_file tests too. The thing is - are there platforms available to test a patch against the whole test suite of python, and against several OSes ? I've found no such thing around build bots, and since I'm kind of stuck on win32 without a full compilation environment (sqlite, zlib etc. lack some setup), I miss vision depth concerning testing... Anyway, here is a corrected patch, passing the full io-related test suites on win32 (and visibly ready for unix too). |
|||
msg98301 - (view) | Author: Pascal Chambon (pakal) * | Date: 2010-01-25 21:38 | |
Hello Just to notify that I've just tested this patch on a fresh python2.6 SVN checkout, on Ubuntu this time (previously, it was only win32), and it passes all IO-related tests I know. |
|||
msg98450 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2010-01-27 21:50 | |
Thank you. The patch was committed mostly unchanged in r77802 (a few cosmetic fixes). Now trunk and py3k must absolutely be patched in order to conform to the new behaviour. Can you produce a patch for trunk? |
|||
msg98451 - (view) | Author: Pascal Chambon (pakal) * | Date: 2010-01-27 22:06 | |
Thanks a lot B-) I'll make a patch for trunk from now to this W.E. |
|||
msg98479 - (view) | Author: Pascal Chambon (pakal) * | Date: 2010-01-28 22:26 | |
Hello Here is the patch for the python trunk, regarding truncate() behaviour. I've tested it on windows and linux, against IO test suites (with -uall), in debug and normal mode. I've also updated some docstrings, and added tests for untested potential errors around buffered read+write streams, when truncating. |
|||
msg98632 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2010-01-31 22:32 | |
Your patch lacked a fix for test_largefile (I think it is skipped under Windows). I added this and committed it in r77890. Will port to py3k. |
|||
msg98635 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2010-01-31 23:24 | |
Committed in py3k (r77895, r77896) and 3.1 (r77897). Thank you Pascal! |
|||
msg98673 - (view) | Author: Pascal Chambon (pakal) * | Date: 2010-02-01 18:24 | |
Argh, I had indeed missed some IO-related tests, hadn't noticed the new test_io docstring: # Tests of io are scattered over the test suite: # * test_bufio - tests file buffering # * test_memoryio - tests BytesIO and StringIO # * test_fileio - tests FileIO # * test_file - tests the file interface # * test_io - tests everything else in the io module # * test_univnewlines - tests universal newline support # * test_largefile - tests operations on a file greater than 2**32 bytes # (only enabled with -ulargefile) Thanks for the commits B-) |
|||
msg106707 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2010-05-29 03:46 | |
To avoid any confusion in the future, it should be noted that Antoine did not unilaterally make the decision to commit this change to a maintenance branch. This change was discussed on python-dev, with the ultimate decision to update the 3.1 semantics being made by Guido: http://mail.python.org/pipermail/python-dev/2009-September/092247.html |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:56:53 | admin | set | nosy:
+ georg.brandl, benjamin.peterson github: 51188 |
2010-05-29 03:46:15 | ncoghlan | set | nosy:
+ ncoghlan messages: + msg106707 |
2010-02-01 18:24:19 | pakal | set | messages: + msg98673 |
2010-01-31 23:24:00 | pitrou | set | status: open -> closed resolution: accepted -> fixed messages: + msg98635 stage: needs patch -> resolved |
2010-01-31 22:32:49 | pitrou | set | resolution: accepted messages: + msg98632 versions: - Python 2.7 |
2010-01-28 22:26:41 | pakal | set | files:
+ PY27_truncate.patch messages: + msg98479 |
2010-01-27 22:06:48 | pakal | set | messages: + msg98451 |
2010-01-27 21:50:27 | pitrou | set | files: - patch26_largefile_tested.patch |
2010-01-27 21:50:22 | pitrou | set | priority: release blocker stage: needs patch messages: + msg98450 versions: - Python 2.6 |
2010-01-25 21:38:47 | pakal | set | messages: + msg98301 |
2010-01-17 19:13:27 | pakal | set | files: - patch26_truncate_pos_refcounts.patch |
2010-01-17 19:13:10 | pakal | set | files:
+ patch26_largefile_tested.patch messages: + msg97966 |
2010-01-16 21:48:53 | pitrou | set | messages: + msg97911 |
2010-01-16 12:09:57 | pakal | set | files:
+ patch26_truncate_pos_refcounts.patch messages: + msg97873 |
2010-01-16 12:03:46 | pakal | set | files: - broken patch |
2010-01-16 12:03:42 | pakal | set | files: - buggy patch to be forgotten.patch |
2010-01-15 22:19:15 | pitrou | set | messages: + msg97855 |
2010-01-15 22:17:38 | pitrou | set | messages: + msg97854 |
2010-01-15 22:15:19 | pitrou | set | messages: + msg97853 |
2010-01-15 21:21:28 | pakal | set | messages: + msg97849 |
2010-01-15 19:17:09 | pitrou | set | messages: + msg97836 |
2010-01-15 19:05:20 | pitrou | set | files: - Buggy patch - to be forgotten |
2010-01-15 19:01:49 | pakal | set | messages: + msg97834 |
2010-01-06 21:38:51 | pakal | set | files:
+ broken patch messages: + msg97329 |
2010-01-06 21:36:37 | pakal | set | files:
+ buggy patch to be forgotten.patch messages: + msg97328 |
2010-01-06 21:02:15 | pakal | set | messages: + msg97326 |
2010-01-06 20:18:05 | pakal | set | files:
+ Buggy patch - to be forgotten keywords: + patch messages: + msg97321 |
2010-01-05 19:54:34 | pakal | set | messages: + msg97277 |
2010-01-05 19:36:01 | pitrou | set | messages: + msg97274 |
2010-01-05 19:27:10 | pakal | set | messages: + msg97273 |
2009-09-18 17:15:51 | pakal | set | messages: + msg92833 |
2009-09-18 14:58:45 | pitrou | set | versions:
- Python 3.0 nosy: + pitrou messages: + msg92827 assignee: pitrou components: + Documentation |
2009-09-18 13:21:52 | pakal | create |