Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

shadows around the io truncate() semantics #51188

Closed
pakal mannequin opened this issue Sep 18, 2009 · 27 comments
Closed

shadows around the io truncate() semantics #51188

pakal mannequin opened this issue Sep 18, 2009 · 27 comments
Assignees
Labels
docs Documentation in the Doc dir release-blocker topic-IO type-bug An unexpected behavior, bug, or error

Comments

@pakal
Copy link
Mannequin

pakal mannequin commented Sep 18, 2009

BPO 6939
Nosy @ncoghlan, @pitrou, @pakal
Files
  • PY27_truncate.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/pitrou'
    closed_at = <Date 2010-01-31.23:24:00.295>
    created_at = <Date 2009-09-18.13:21:52.144>
    labels = ['type-bug', 'release-blocker', 'expert-IO', 'docs']
    title = 'shadows around the io truncate() semantics'
    updated_at = <Date 2010-05-29.03:46:15.858>
    user = 'https://github.com/pakal'

    bugs.python.org fields:

    activity = <Date 2010-05-29.03:46:15.858>
    actor = 'ncoghlan'
    assignee = 'pitrou'
    closed = True
    closed_date = <Date 2010-01-31.23:24:00.295>
    closer = 'pitrou'
    components = ['Documentation', 'IO']
    creation = <Date 2009-09-18.13:21:52.144>
    creator = 'pakal'
    dependencies = []
    files = ['16030']
    hgrepos = []
    issue_num = 6939
    keywords = ['patch']
    message_count = 27.0
    messages = ['92822', '92827', '92833', '97273', '97274', '97277', '97321', '97326', '97328', '97329', '97834', '97836', '97849', '97853', '97854', '97855', '97873', '97911', '97966', '98301', '98450', '98451', '98479', '98632', '98635', '98673', '106707']
    nosy_count = 3.0
    nosy_names = ['ncoghlan', 'pitrou', 'pakal']
    pr_nums = []
    priority = 'release blocker'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue6939'
    versions = ['Python 3.1', 'Python 3.2']

    @pakal
    Copy link
    Mannequin Author

    pakal mannequin commented Sep 18, 2009

    Hello

    I'm having trouble around the behaviour of the io module's truncate
    methods, in py3k. If I remember well, under unix and older versions of
    Python (with other file types), truncate never move the fiel pointer
    (and had to fake that behaviour under windows as well, by
    saving/restoring the file pointer).

    However, I see nothing in the documentation of the io module about that,
    and studying the _fileio.c code, it seems that now the file pointer is
    (under all OS) moved to the truncation point, and left there.

    Maybe it requires discusions on the mailing list, but personally I think
    that the common unix behaviour (letting the file pointer untouched)
    would be better for everyone, and that the doc should claim it.

    Also, additional notes about the behaviour of that truncation (reminding
    people that on some OS, the padding is zero-filled, on others it's
    not...) would be great.

    Current doc of io in py3k :
    http://docs.python.org/py3k/library/io.html#io.IOBase.truncate
    truncate(size=None)
    Truncate the file to at most size bytes. size defaults to the
    current file position, as returned by tell().

    Cheers
    Pascal

    @pakal pakal mannequin added topic-IO type-bug An unexpected behavior, bug, or error labels Sep 18, 2009
    @pitrou
    Copy link
    Member

    pitrou commented Sep 18, 2009

    I can't tell you why it was decided to behave like this. Perhaps it was
    felt that, since FileIO is low-level (lower than the file object in
    2.x), it shouldn't have to remember and restore the old file position.
    That argument wouldn't apply to BufferedWriter, though.

    I'm afraid I don't know anything about the padding differences myself.
    If you give us more precisions on the subject, we will be able to
    include them in the documentation.

    @pitrou pitrou added the docs Documentation in the Doc dir label Sep 18, 2009
    @pitrou pitrou self-assigned this Sep 18, 2009
    @pakal
    Copy link
    Mannequin Author

    pakal mannequin commented Sep 18, 2009

    Well, I guess it deserve discussion on the pydev mailing lits, that's
    imo a rather important point, to be documented precisely.

    Concerning the padding, I guess the semantic doesn't change between the
    io module and the old file type, i.e :

    file.truncate([size])¶
    Truncate the file’s size. If the optional size argument is present,
    the file is truncated to (at most) that size. The size defaults to the
    current position. The current file position is not changed. Note that if
    a specified size exceeds the file’s current size, the result is
    platform-dependent: possibilities include that the file may remain
    unchanged, increase to the specified size as if zero-filled, or increase
    to the specified size with undefined new content. Availability: Windows,
    many Unix variants.

    Having platform-dependent semantics for a so important type is anyway
    dangerous, I feel. The io module should be rock-solid, with very deeply
    documented semantics, and implementations that force platforms to
    conform to these specifications. Eg. I've noticed that all io.FileIO
    methods raised IOErrors in case of problem, except in one case (wrapping
    a bad file decsriptor) where it's OSError - that little thing might make
    big programs crash in an ugly way, if they only caught IOError.
    I'm currently working on a cross-platform (linux, mac, windows at least)
    FileIO type, with very strict semantics and extended abilities (file
    range shared and exclusive locking, with timeouts) ; even though it'll
    initially be slower than _fileio (most of it is in pure python), maybe
    it might solve most of these io module problems. I'll keep people informed.

    @pakal
    Copy link
    Mannequin Author

    pakal mannequin commented Jan 5, 2010

    Hello

    I'm currently finalizing the API of my raw io file implementation, but I still have trouble with the trunk implementation of IOBase.truncate().

    If I remember well, in the mailing list topic on this subject, GvR noted that this change of behaviour compared to python 2.x was not intended, and that it would be better to get back to the expected behaviour - not touching the file pointer - and to document the method in this way.

    Are there new elements, advocating a status quo on this matter ?
    Or shouldn't we add the portable_lseek() call in fileio.c to fix that ?

    On a separate note, I'm confused about the "at most" phrase in the current documentation :
    ---

    truncate(size=None)
        Truncate the file to at most size bytes. size defaults to the current file position, as returned by tell()

    According to what I've read so far, a succesful truncate() call will always extend/reduce the file until teh desired size, isn't that so on all platforms ?

    Regards
    Pascal

    @pitrou
    Copy link
    Member

    pitrou commented Jan 5, 2010

    Are there new elements, advocating a status quo on this matter ?

    Nothing, it's just lacking a patch from someone interested in the
    matter.

    On a separate note, I'm confused about the "at most" phrase in the
    current documentation :
    ---
    truncate(size=None)
    Truncate the file to at most size bytes. size defaults to the
    current file position, as returned by tell()
    ---
    According to what I've read so far, a succesful truncate() call will
    always extend/reduce the file until teh desired size, isn't that so on
    all platforms ?

    I suppose it was worded that way because some platforms may not support
    extending a file when a parameter larger than the current size is given.
    I don't know if such systems are widespread.

    @pakal
    Copy link
    Mannequin Author

    pakal mannequin commented Jan 5, 2010

    Allright, I'll try to work on it as soon as I manage to gather a decent compilation environment on windows...

    @pakal
    Copy link
    Mannequin Author

    pakal mannequin commented Jan 6, 2010

    Hi

    Here is a patch for the python2.6 _fileio.c module, as well as the corresponding testcase.
    I'll check the _pyio and C _io trunk modules asap.

    @pakal
    Copy link
    Mannequin Author

    pakal mannequin commented Jan 6, 2010

    Whoups - I forgot to bugfix as well the _bytesio classes... let's just forget about the previous patch.

    @pakal
    Copy link
    Mannequin Author

    pakal mannequin commented Jan 6, 2010

    Here is the new patch for py2.6, fixing (hopefully) all the truncate() methods.
    Note that the python _BytesIO implementation isn't covered by the test suite, as it's shadowed by the C implementation ; but imo we shouldn't care : I've manually tested it once, and anyway the trunk sources don't have this problem.

    @pakal
    Copy link
    Mannequin Author

    pakal mannequin commented Jan 6, 2010

    And here is the python trunk patch, covering _Pyio and _io modules (+ corresponding tests).

    @pakal
    Copy link
    Mannequin Author

    pakal mannequin commented Jan 15, 2010

    Is there anything I can do to help this patch making its way to the trunk ?
    I guess it'd be better if Python2.7 benefited from it, so that users don't run anymore the risk of relying of this undocumented and non-canonical "truncate" behaviour.

    Regards,
    Pascal

    @pitrou
    Copy link
    Member

    pitrou commented Jan 15, 2010

    Looking at the patches:

    • if you want to remove code, please remove it; don't comment it out
    • please don't use C++-style comments (it will break on some compilers, produce warnings on others)
    • code lines should generally not be wider than 80 chars (although this is not really a hard rule)
    • I don't understand the purpose of "self.seek(0, os.SEEK_CUR)"; shouldn't it be a no-op? If not, please add a comment to explain why.
    • you are not decrefinf objects properly when you are done with them (posobj, oldposobj), which will certainly produce memory leaks
    • some tests seem to be missing from the 2.7 patch, compared to the 2.6 one
    • if you want to test BytesIO and StringIO, tests can be added to Lib/test/test_memoryio.py

    @pakal
    Copy link
    Mannequin Author

    pakal mannequin commented Jan 15, 2010

    Allright, I shall fix all this asap.

    But it seems the C code for truncate is actually buggy in the current 2.6 _fileio.c, around line 680.
    CF code below :

    posobj = portable_lseek(fd, posobj, 0); -> don't we lose the reference to the old "posobj" there, doing a memory leak ?

    if (PyErr_Occurred()) return NULL; -> same thing, we return Null without caring about the posobj reference which should be non-Null there ??

    If I've understood a little reference counting, "portable_lseek" returns a reference that we own and must Py_DECREF, isn't that so ?

    --------

    	if (posobj == Py_None || posobj == NULL) {
    		/* Get the current position. */
                    posobj = portable_lseek(fd, NULL, 1);
                    if (posobj == NULL)
    			return NULL;
            }
            else {
    		/* Move to the position to be truncated. */
                    posobj = portable_lseek(fd, posobj, 0);
            }
    
    #if defined(HAVE_LARGEFILE_SUPPORT)
    	pos = PyLong_AsLongLong(posobj);
    #else
    	pos = PyLong_AsLong(posobj);
    #endif
    	if (PyErr_Occurred())
    		return NULL;

    @pitrou
    Copy link
    Member

    pitrou commented Jan 15, 2010

    posobj = portable_lseek(fd, posobj, 0); -> don't we lose the reference
    to the old "posobj" there, doing a memory leak ?

    It's a bit more subtle.
    Here, the first reference to "posobj" that you get is through the
    function arguments. You don't own that reference, so must not decref it
    when you are done with it. Moreover, if you return that very same
    reference, you should incref it first (which your patch doesn't do when
    posobj is non-None, and therefore loses a reference, see below).

    However, when you get a new "posobj" from portable_lseek() (or most
    other C API functions), you own this new reference and therefore must
    decref it when you are done with it.

    To sum it up and if I'm not mistaken, you must:

    • add a Py_INCREF(posobj) when posobj is non-NULL and non-None.
    • Py_DECREF the first posobj, as well as oldposobj, in the
      Windows-specific path.

    If you wanna witness reference counting behaviour, you should build
    Python in debug mode (--with-pydebug if under Linux). Right now under
    Linux your patch produces the following behaviour:

    [39542 refs]
    >>> f.truncate()
    0L
    [39547 refs]
    >>> f.truncate()
    0L
    [39547 refs]
    >>> f.truncate(2)
    2
    [39545 refs]
    >>> f.truncate(2)
    2
    [39544 refs]

    As you see, when posobj is non-None, we actually lose a reference
    because a Py_INCREF is missing.
    If you do this too often the small integer object "2" gets wrongly
    destroyed and the interpreter crashes:

    ...
    [snip]
    Erreur de segmentation

    @pitrou
    Copy link
    Member

    pitrou commented Jan 15, 2010

    He, roundup ate part of the code I pasted. Here it is again:

    >>> import io
    [39516 refs]
    >>> f = io.open("foo", "wb", buffering=0)
    [39542 refs]
    >>> f.truncate()
    0L
    [39544 refs]
    >>> f.truncate()
    0L
    [39544 refs]
    >>> f.truncate(2)
    2
    [39543 refs]
    >>> f.truncate(2)
    2
    [39542 refs]
    >>> while True: f.truncate(2)
    ... 
    [snip]
    Erreur de segmentation

    @pitrou
    Copy link
    Member

    pitrou commented Jan 15, 2010

    By the way, you witness the issue (less clearly though) by running the tests in debug mode, too:

    $ ./python -m test.regrtest -v test_io
    test_io
    test_BufferedIOBase_destructor (test.test_io.CIOTest) ... ok
    test_IOBase_destructor (test.test_io.CIOTest) ... ok
    test_RawIOBase_destructor (test.test_io.CIOTest) ... ok
    test_TextIOBase_destructor (test.test_io.CIOTest) ... ok
    test_append_mode_tell (test.test_io.CIOTest) ... ok
    test_array_writes (test.test_io.CIOTest) ... ok
    test_buffered_file_io (test.test_io.CIOTest) ... ok
    test_close_flushes (test.test_io.CIOTest) ... ok
    test_closefd (test.test_io.CIOTest) ... ok
    test_closefd_attr (test.test_io.CIOTest) ... ok
    test_destructor (test.test_io.CIOTest) ... ok
    test_garbage_collection (test.test_io.CIOTest) ... ok
    test_invalid_operations (test.test_io.CIOTest) ... ok
    test_large_file_ops (test.test_io.CIOTest) ... Fatal Python error: Python/ceval.c:4058 object at 0x2572448 has negative ref count -1
    Abandon

    @pakal
    Copy link
    Mannequin Author

    pakal mannequin commented Jan 16, 2010

    Thanks for the detailed feedback.

    According to what you said (and to details found in python docs), the current _fileio.truncate is actually quite buggy -> no reference counting ops were done on posobj, even though it's sometimes retrieved from _portable_seek, and returned or not depending on rare error cases (SetEndOfFile failing etc.).

    Here is a patch fixing all that for python2.6, I've tested it manually in debug mode, and against memoryio/io/fileio tests ; hope it will be OK.

    As soon as this one is certified, I'll prepare a patch for python2.7

    @pitrou
    Copy link
    Member

    pitrou commented Jan 16, 2010

    First, I get the following compilation warnings under Linux:
    /home/antoine/cpython/26/Modules/_fileio.c: In function ‘fileio_truncate’:
    /home/antoine/cpython/26/Modules/_fileio.c:651: attention : unused variable ‘tempposobj’
    /home/antoine/cpython/26/Modules/_fileio.c:650: attention : unused variable ‘oldposobj’

    The two variables should be declared at the beginning of the Windows-specific block instead.

    Second, there's a test failure in test_io:

    ======================================================================
    FAIL: test_large_file_ops (test.test_io.IOTest)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/home/antoine/cpython/26/Lib/test/test_io.py", line 221, in test_large_file_ops
        self.large_file_ops(f)
      File "/home/antoine/cpython/26/Lib/test/test_io.py", line 154, in large_file_ops
        self.assertEqual(f.tell(), self.LARGE + 1)
    AssertionError: 2147483650L != 2147483649

    @pakal
    Copy link
    Mannequin Author

    pakal mannequin commented Jan 17, 2010

    Allright - sorry for the failure - I've cleaned my hdd enough to launch large_file tests too.

    The thing is - are there platforms available to test a patch against the whole test suite of python, and against several OSes ? I've found no such thing around build bots, and since I'm kind of stuck on win32 without a full compilation environment (sqlite, zlib etc. lack some setup), I miss vision depth concerning testing...

    Anyway, here is a corrected patch, passing the full io-related test suites on win32 (and visibly ready for unix too).

    @pakal
    Copy link
    Mannequin Author

    pakal mannequin commented Jan 25, 2010

    Hello

    Just to notify that I've just tested this patch on a fresh python2.6 SVN checkout, on Ubuntu this time (previously, it was only win32), and it passes all IO-related tests I know.

    @pitrou
    Copy link
    Member

    pitrou commented Jan 27, 2010

    Thank you. The patch was committed mostly unchanged in r77802 (a few cosmetic fixes). Now trunk and py3k must absolutely be patched in order to conform to the new behaviour. Can you produce a patch for trunk?

    @pakal
    Copy link
    Mannequin Author

    pakal mannequin commented Jan 27, 2010

    Thanks a lot B-)
    I'll make a patch for trunk from now to this W.E.

    @pakal
    Copy link
    Mannequin Author

    pakal mannequin commented Jan 28, 2010

    Hello

    Here is the patch for the python trunk, regarding truncate() behaviour.
    I've tested it on windows and linux, against IO test suites (with -uall), in debug and normal mode.

    I've also updated some docstrings, and added tests for untested potential errors around buffered read+write streams, when truncating.

    @pitrou
    Copy link
    Member

    pitrou commented Jan 31, 2010

    Your patch lacked a fix for test_largefile (I think it is skipped under Windows). I added this and committed it in r77890. Will port to py3k.

    @pitrou
    Copy link
    Member

    pitrou commented Jan 31, 2010

    Committed in py3k (r77895, r77896) and 3.1 (r77897). Thank you Pascal!

    @pitrou pitrou closed this as completed Jan 31, 2010
    @pakal
    Copy link
    Mannequin Author

    pakal mannequin commented Feb 1, 2010

    Argh, I had indeed missed some IO-related tests, hadn't noticed the new test_io docstring:
    # Tests of io are scattered over the test suite:
    # * test_bufio - tests file buffering
    # * test_memoryio - tests BytesIO and StringIO
    # * test_fileio - tests FileIO
    # * test_file - tests the file interface
    # * test_io - tests everything else in the io module
    # * test_univnewlines - tests universal newline support
    # * test_largefile - tests operations on a file greater than 2**32 bytes
    # (only enabled with -ulargefile)

    Thanks for the commits B-)

    @ncoghlan
    Copy link
    Contributor

    To avoid any confusion in the future, it should be noted that Antoine did not unilaterally make the decision to commit this change to a maintenance branch.

    This change was discussed on python-dev, with the ultimate decision to update the 3.1 semantics being made by Guido: http://mail.python.org/pipermail/python-dev/2009-September/092247.html

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    docs Documentation in the Doc dir release-blocker topic-IO type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants