classification
Title: shadows around the io truncate() semantics
Type: behavior Stage: resolved
Components: Documentation, IO Versions: Python 3.2, Python 3.1
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: pitrou Nosy List: ncoghlan, pakal, pitrou
Priority: release blocker Keywords: patch

Created on 2009-09-18 13:21 by pakal, last changed 2010-05-29 03:46 by ncoghlan. This issue is now closed.

Files
File name Uploaded Description Edit
PY27_truncate.patch pakal, 2010-01-28 22:26
Messages (27)
msg92822 - (view) Author: Pascal Chambon (pakal) Date: 2009-09-18 13:21
Hello

I'm having trouble around the behaviour of the io module's truncate
methods, in py3k. If I remember well, under unix and older versions of
Python (with other file types), truncate never move the fiel pointer
(and had to fake that behaviour under windows as well, by
saving/restoring the file pointer).

However, I see nothing in the documentation of the io module about that,
and studying the _fileio.c code, it seems that now the file pointer is
(under all OS) moved to the truncation point, and left there.

Maybe it requires discusions on the mailing list, but personally I think
that the common unix behaviour (letting the file pointer untouched)
would be better for everyone, and that the doc should claim it.

Also, additional notes about the behaviour of that truncation (reminding
people that on some OS, the padding is zero-filled, on others it's
not...) would be great.

Current doc of io in py3k :
http://docs.python.org/py3k/library/io.html#io.IOBase.truncate
truncate(size=None)
    Truncate the file to at most size bytes. size defaults to the
current file position, as returned by tell().

Cheers
Pascal
msg92827 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-09-18 14:58
I can't tell you why it was decided to behave like this. Perhaps it was
felt that, since FileIO is low-level (lower than the file object in
2.x), it shouldn't have to remember and restore the old file position.
That argument wouldn't apply to BufferedWriter, though.

I'm afraid I don't know anything about the padding differences myself.
If you give us more precisions on the subject, we will be able to
include them in the documentation.
msg92833 - (view) Author: Pascal Chambon (pakal) Date: 2009-09-18 17:15
Well, I guess it deserve discussion on the pydev mailing lits, that's
imo a rather important point, to be documented precisely.

Concerning the padding, I guess the semantic doesn't change between the
io module and the old file type, i.e :

file.truncate([size])¶
    Truncate the file’s size. If the optional size argument is present,
the file is truncated to (at most) that size. The size defaults to the
current position. The current file position is not changed. Note that if
a specified size exceeds the file’s current size, the result is
platform-dependent: possibilities include that the file may remain
unchanged, increase to the specified size as if zero-filled, or increase
to the specified size with undefined new content. Availability: Windows,
many Unix variants.

Having platform-dependent semantics for a so important type is anyway
dangerous, I feel. The io module should be rock-solid, with very deeply
documented semantics, and implementations that force platforms to
conform to these specifications. Eg. I've noticed that all io.FileIO
methods raised IOErrors in case of problem, except in one case (wrapping
a bad file decsriptor) where it's OSError - that little thing might make
big programs crash in an ugly way, if they only caught IOError.
I'm currently working on a cross-platform (linux, mac, windows at least)
FileIO type, with very strict semantics and extended abilities (file
range shared and exclusive locking, with timeouts) ; even though it'll
initially be slower than _fileio (most of it is in pure python), maybe
it might solve most of these io module problems. I'll keep people informed.
msg97273 - (view) Author: Pascal Chambon (pakal) Date: 2010-01-05 19:27
Hello

I'm currently finalizing the API of my raw io file implementation, but I still have trouble with the trunk implementation of IOBase.truncate().

If I remember well, in the mailing list topic on this subject, GvR noted that this change of behaviour compared to python 2.x was not intended, and that it would be better to get back to the expected behaviour - not touching the file pointer - and to document the method in this way.

Are there new elements, advocating a status quo on this matter ?
Or shouldn't we add the portable_lseek() call in fileio.c to fix that ?

On a separate note, I'm confused about the "at most" phrase in the current documentation :
---
truncate(size=None)
    Truncate the file to at most size bytes. size defaults to the current file position, as returned by tell()
---
According to what I've read so far, a succesful truncate() call will always extend/reduce the file until teh desired size, isn't that so on all platforms ?

Regards 
Pascal
msg97274 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-01-05 19:36
> Are there new elements, advocating a status quo on this matter ?

Nothing, it's just lacking a patch from someone interested in the
matter.

> On a separate note, I'm confused about the "at most" phrase in the
> current documentation :
> ---
> truncate(size=None)
>     Truncate the file to at most size bytes. size defaults to the
> current file position, as returned by tell()
> ---
> According to what I've read so far, a succesful truncate() call will
> always extend/reduce the file until teh desired size, isn't that so on
> all platforms ?

I suppose it was worded that way because some platforms may not support
extending a file when a parameter larger than the current size is given.
I don't know if such systems are widespread.
msg97277 - (view) Author: Pascal Chambon (pakal) Date: 2010-01-05 19:54
Allright, I'll try to work on it as soon as I manage to gather a decent compilation environment on windows...
msg97321 - (view) Author: Pascal Chambon (pakal) Date: 2010-01-06 20:18
Hi

Here is a patch for the python2.6 _fileio.c module, as well as the corresponding testcase.
I'll check the _pyio and C _io trunk modules asap.
msg97326 - (view) Author: Pascal Chambon (pakal) Date: 2010-01-06 21:02
Whoups - I forgot to bugfix as well the _bytesio classes... let's just forget about the previous patch.
msg97328 - (view) Author: Pascal Chambon (pakal) Date: 2010-01-06 21:36
Here is the new patch for py2.6, fixing (hopefully) all the truncate() methods.
Note that the python _BytesIO implementation isn't covered by the test suite, as it's shadowed by the C implementation ; but imo we shouldn't care : I've manually tested it once, and anyway the trunk sources don't have this problem.
msg97329 - (view) Author: Pascal Chambon (pakal) Date: 2010-01-06 21:38
And here is the python trunk patch, covering _Pyio and _io modules (+ corresponding tests).
msg97834 - (view) Author: Pascal Chambon (pakal) Date: 2010-01-15 19:01
Is there anything I can do to help this patch making its way to the trunk ?
I guess it'd be better if Python2.7 benefited from it, so that users don't run anymore the risk of relying of this undocumented and non-canonical "truncate" behaviour.

Regards, 
Pascal
msg97836 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-01-15 19:17
Looking at the patches:
- if you want to remove code, please remove it; don't comment it out
- please don't use C++-style comments (it will break on some compilers, produce warnings on others)
- code lines should generally not be wider than 80 chars (although this is not really a hard rule)
- I don't understand the purpose of "self.seek(0, os.SEEK_CUR)"; shouldn't it be a no-op? If not, please add a comment to explain why.
- you are not decrefinf objects properly when you are done with them (posobj, oldposobj), which will certainly produce memory leaks
- some tests seem to be missing from the 2.7 patch, compared to the 2.6 one
- if you want to test BytesIO and StringIO, tests can be added to Lib/test/test_memoryio.py
msg97849 - (view) Author: Pascal Chambon (pakal) Date: 2010-01-15 21:21
Allright, I shall fix all this asap.

But it seems the C code for truncate is actually buggy in the current 2.6 _fileio.c, around line 680.
CF code below :

posobj = portable_lseek(fd, posobj, 0); -> don't we lose the reference to the old "posobj" there, doing a memory leak ?

if (PyErr_Occurred()) return NULL; -> same thing, we return Null without caring about the posobj reference which should be non-Null there ??

If I've understood a little reference counting, "portable_lseek" returns a reference that we own and must Py_DECREF, isn't that so ?



--------
	if (posobj == Py_None || posobj == NULL) {
		/* Get the current position. */
                posobj = portable_lseek(fd, NULL, 1);
                if (posobj == NULL)
			return NULL;
        }
        else {
		/* Move to the position to be truncated. */
                posobj = portable_lseek(fd, posobj, 0);
        }

#if defined(HAVE_LARGEFILE_SUPPORT)
	pos = PyLong_AsLongLong(posobj);
#else
	pos = PyLong_AsLong(posobj);
#endif
	if (PyErr_Occurred())
		return NULL;

------
msg97853 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-01-15 22:15
> posobj = portable_lseek(fd, posobj, 0); -> don't we lose the reference
> to the old "posobj" there, doing a memory leak ?

It's a bit more subtle.
Here, the first reference to "posobj" that you get is through the
function arguments. You don't own that reference, so must not decref it
when you are done with it. Moreover, if you return that very same
reference, you should incref it first (which your patch doesn't do when
posobj is non-None, and therefore loses a reference, see below).

However, when you get a new "posobj" from portable_lseek() (or most
other C API functions), you own this new reference and therefore must
decref it when you are done with it.

To sum it up and if I'm not mistaken, you must:
- add a Py_INCREF(posobj) when posobj is non-NULL and non-None.
- Py_DECREF the first posobj, as well as oldposobj, in the
Windows-specific path.

If you wanna witness reference counting behaviour, you should build
Python in debug mode (--with-pydebug if under Linux). Right now under
Linux your patch produces the following behaviour:

[39542 refs]
>>> f.truncate()
0L
[39547 refs]
>>> f.truncate()
0L
[39547 refs]
>>> f.truncate(2)
2
[39545 refs]
>>> f.truncate(2)
2
[39544 refs]

As you see, when posobj is non-None, we actually lose a reference
because a Py_INCREF is missing.
If you do this too often the small integer object "2" gets wrongly
destroyed and the interpreter crashes:

... 
[snip]
Erreur de segmentation
msg97854 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-01-15 22:17
He, roundup ate part of the code I pasted. Here it is again:

>>> import io
[39516 refs]
>>> f = io.open("foo", "wb", buffering=0)
[39542 refs]
>>> f.truncate()
0L
[39544 refs]
>>> f.truncate()
0L
[39544 refs]
>>> f.truncate(2)
2
[39543 refs]
>>> f.truncate(2)
2
[39542 refs]
>>> while True: f.truncate(2)
... 
[snip]
Erreur de segmentation
msg97855 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-01-15 22:19
By the way, you witness the issue (less clearly though) by running the tests in debug mode, too:

$ ./python -m test.regrtest -v test_io
test_io
test_BufferedIOBase_destructor (test.test_io.CIOTest) ... ok
test_IOBase_destructor (test.test_io.CIOTest) ... ok
test_RawIOBase_destructor (test.test_io.CIOTest) ... ok
test_TextIOBase_destructor (test.test_io.CIOTest) ... ok
test_append_mode_tell (test.test_io.CIOTest) ... ok
test_array_writes (test.test_io.CIOTest) ... ok
test_buffered_file_io (test.test_io.CIOTest) ... ok
test_close_flushes (test.test_io.CIOTest) ... ok
test_closefd (test.test_io.CIOTest) ... ok
test_closefd_attr (test.test_io.CIOTest) ... ok
test_destructor (test.test_io.CIOTest) ... ok
test_garbage_collection (test.test_io.CIOTest) ... ok
test_invalid_operations (test.test_io.CIOTest) ... ok
test_large_file_ops (test.test_io.CIOTest) ... Fatal Python error: Python/ceval.c:4058 object at 0x2572448 has negative ref count -1
Abandon
msg97873 - (view) Author: Pascal Chambon (pakal) Date: 2010-01-16 12:09
Thanks for the detailed feedback.

According to what you said (and to details found in python docs), the current _fileio.truncate is actually quite buggy -> no reference counting ops were done on posobj, even though it's sometimes retrieved from _portable_seek, and returned or not depending on rare error cases (SetEndOfFile failing etc.).

Here is a patch fixing all that for python2.6, I've tested it manually in debug mode, and against memoryio/io/fileio tests ; hope it will be OK.

As soon as this one is certified, I'll prepare a patch for python2.7
msg97911 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-01-16 21:48
First, I get the following compilation warnings under Linux:
/home/antoine/cpython/26/Modules/_fileio.c: In function ‘fileio_truncate’:
/home/antoine/cpython/26/Modules/_fileio.c:651: attention : unused variable ‘tempposobj’
/home/antoine/cpython/26/Modules/_fileio.c:650: attention : unused variable ‘oldposobj’

The two variables should be declared at the beginning of the Windows-specific block instead.

Second, there's a test failure in test_io:

======================================================================
FAIL: test_large_file_ops (test.test_io.IOTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/antoine/cpython/26/Lib/test/test_io.py", line 221, in test_large_file_ops
    self.large_file_ops(f)
  File "/home/antoine/cpython/26/Lib/test/test_io.py", line 154, in large_file_ops
    self.assertEqual(f.tell(), self.LARGE + 1)
AssertionError: 2147483650L != 2147483649

----------------------------------------------------------------------
msg97966 - (view) Author: Pascal Chambon (pakal) Date: 2010-01-17 19:13
Allright - sorry for the failure - I've cleaned my hdd enough to launch large_file tests too.

The thing is - are there platforms available to test a patch against the whole test suite of python, and against several OSes ? I've found no such thing around build bots, and since I'm kind of stuck on win32 without a full compilation environment (sqlite, zlib etc. lack some setup), I miss vision depth concerning testing...

Anyway, here is a corrected patch, passing the full io-related test suites on win32 (and visibly ready for unix too).
msg98301 - (view) Author: Pascal Chambon (pakal) Date: 2010-01-25 21:38
Hello

Just to notify that I've just tested this patch on a fresh python2.6 SVN checkout, on Ubuntu this time (previously, it was only win32), and it passes all IO-related tests I know.
msg98450 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-01-27 21:50
Thank you. The patch was committed mostly unchanged in r77802 (a few cosmetic fixes). Now trunk and py3k must absolutely be patched in order to conform to the new behaviour. Can you produce a patch for trunk?
msg98451 - (view) Author: Pascal Chambon (pakal) Date: 2010-01-27 22:06
Thanks a lot B-)
I'll make a patch for trunk from now to this W.E.
msg98479 - (view) Author: Pascal Chambon (pakal) Date: 2010-01-28 22:26
Hello

Here is the patch for the python trunk, regarding truncate() behaviour.
I've tested it on windows and linux, against IO test suites (with -uall), in debug and normal mode.

I've also updated some docstrings, and added tests for untested potential errors around buffered read+write streams, when truncating.
msg98632 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-01-31 22:32
Your patch lacked a fix for test_largefile (I think it is skipped under Windows). I added this and committed it in r77890. Will port to py3k.
msg98635 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-01-31 23:24
Committed in py3k (r77895, r77896) and 3.1 (r77897). Thank you Pascal!
msg98673 - (view) Author: Pascal Chambon (pakal) Date: 2010-02-01 18:24
Argh, I had indeed missed some IO-related tests, hadn't noticed the new test_io docstring:
# Tests of io are scattered over the test suite:
# * test_bufio - tests file buffering
# * test_memoryio - tests BytesIO and StringIO
# * test_fileio - tests FileIO
# * test_file - tests the file interface
# * test_io - tests everything else in the io module
# * test_univnewlines - tests universal newline support
# * test_largefile - tests operations on a file greater than 2**32 bytes
#     (only enabled with -ulargefile)

Thanks for the commits B-)
msg106707 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2010-05-29 03:46
To avoid any confusion in the future, it should be noted that Antoine did not unilaterally make the decision to commit this change to a maintenance branch.

This change was discussed on python-dev, with the ultimate decision to update the 3.1 semantics being made by Guido: http://mail.python.org/pipermail/python-dev/2009-September/092247.html
History
Date User Action Args
2010-05-29 03:46:15ncoghlansetnosy: + ncoghlan
messages: + msg106707
2010-02-01 18:24:19pakalsetmessages: + msg98673
2010-01-31 23:24:00pitrousetstatus: open -> closed
resolution: accepted -> fixed
messages: + msg98635

stage: needs patch -> resolved
2010-01-31 22:32:49pitrousetresolution: accepted
messages: + msg98632
versions: - Python 2.7
2010-01-28 22:26:41pakalsetfiles: + PY27_truncate.patch

messages: + msg98479
2010-01-27 22:06:48pakalsetmessages: + msg98451
2010-01-27 21:50:27pitrousetfiles: - patch26_largefile_tested.patch
2010-01-27 21:50:22pitrousetpriority: release blocker

stage: needs patch
messages: + msg98450
versions: - Python 2.6
2010-01-25 21:38:47pakalsetmessages: + msg98301
2010-01-17 19:13:27pakalsetfiles: - patch26_truncate_pos_refcounts.patch
2010-01-17 19:13:10pakalsetfiles: + patch26_largefile_tested.patch

messages: + msg97966
2010-01-16 21:48:53pitrousetmessages: + msg97911
2010-01-16 12:09:57pakalsetfiles: + patch26_truncate_pos_refcounts.patch

messages: + msg97873
2010-01-16 12:03:46pakalsetfiles: - broken patch
2010-01-16 12:03:42pakalsetfiles: - buggy patch to be forgotten.patch
2010-01-15 22:19:15pitrousetmessages: + msg97855
2010-01-15 22:17:38pitrousetmessages: + msg97854
2010-01-15 22:15:19pitrousetmessages: + msg97853
2010-01-15 21:21:28pakalsetmessages: + msg97849
2010-01-15 19:17:09pitrousetmessages: + msg97836
2010-01-15 19:05:20pitrousetfiles: - Buggy patch - to be forgotten
2010-01-15 19:01:49pakalsetmessages: + msg97834
2010-01-06 21:38:51pakalsetfiles: + broken patch

messages: + msg97329
2010-01-06 21:36:37pakalsetfiles: + buggy patch to be forgotten.patch

messages: + msg97328
2010-01-06 21:02:15pakalsetmessages: + msg97326
2010-01-06 20:18:05pakalsetfiles: + Buggy patch - to be forgotten
keywords: + patch
messages: + msg97321
2010-01-05 19:54:34pakalsetmessages: + msg97277
2010-01-05 19:36:01pitrousetmessages: + msg97274
2010-01-05 19:27:10pakalsetmessages: + msg97273
2009-09-18 17:15:51pakalsetmessages: + msg92833
2009-09-18 14:58:45pitrousetversions: - Python 3.0
nosy: + pitrou

messages: + msg92827

assignee: pitrou
components: + Documentation
2009-09-18 13:21:52pakalcreate