classification
Title: Fast BytesIO implementation + misc changes
Type: performance Stage:
Components: Interpreter Core, Library (Lib), Tests Versions: Python 3.0
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: alexandre.vassalotti Nosy List: alexandre.vassalotti, brett.cannon, christian.heimes, georg.brandl, gvanrossum, pitrou
Priority: critical Keywords: patch

Created on 2008-01-07 07:20 by alexandre.vassalotti, last changed 2008-05-06 20:03 by alexandre.vassalotti. This issue is now closed.

Files
File name Uploaded Description Edit
bytesio+misc-fixes-2.patch alexandre.vassalotti, 2008-01-07 22:22
bytesio+misc-fixes-3.patch alexandre.vassalotti, 2008-03-19 00:08
bytesio+misc-fixes-4.patch alexandre.vassalotti, 2008-03-30 05:02
bytesio+misc-fixes-5.patch alexandre.vassalotti, 2008-03-30 18:43
bytesio+misc-fixes-6.patch alexandre.vassalotti, 2008-03-30 18:52
bytesio+misc-fixes-8.patch alexandre.vassalotti, 2008-04-13 20:58
Messages (26)
msg59436 - (view) Author: Alexandre Vassalotti (alexandre.vassalotti) * (Python committer) Date: 2008-01-07 07:20
Finally, here is my C implementation of BytesIO. The code is well tested
and include the proper unit tests. The only remaining issues are:
  - The behavior of the close() method.
  - The failure of test_profile and test_cProfile.

Currently, I have no idea how to fix the tests for the profilers. The
weird thing is both pass when run with:

   % ./python Lib/test/regrtest.py -R: test_profile
msg59447 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2008-01-07 15:06
Any chance of uploading a single patch that contains all these changes?

The profile tests often fail when io.py changes because they happen to
depend on "golden output" which includes line numbers of code in io.py
that happens to be traced during the test.  I don't know why -R: makes
it pass but it could be that it doesn't compare the golden output.
msg59460 - (view) Author: Alexandre Vassalotti (alexandre.vassalotti) * (Python committer) Date: 2008-01-07 16:19
So, here's one big patch. I have updated the behavior of close(), so that 

> The profile tests often fail when io.py changes because they happen to
> depend on "golden output" which includes line numbers of code in io.py
> that happens to be traced during the test. 

Yes, I knew that. But, how can I fix the test so that it passes even if
_bytesio is not available?

Oh, one more thing. In the misc fixes for io.py, I added a checkClosed
in IOBase.readline(). As a side-effect, this make __next__ raises a
ValueError, instead of StopIteration. Is that correct?  

>>> f = open("load.py")
[45681 refs]
>>> next(f)
'import sys\n'
[45700 refs]
>>> f.close()
[45703 refs]
>>> next(f)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/alex/src/python.org/py3k/Lib/io.py", line 1440, in __next__
    line = self.readline()
  File "/home/alex/src/python.org/py3k/Lib/io.py", line 1449, in readline
    raise ValueError("read from closed file")
ValueError: read from closed file
[45703 refs]
msg59461 - (view) Author: Alexandre Vassalotti (alexandre.vassalotti) * (Python committer) Date: 2008-01-07 16:21
[grrr, I eat my words]

> So, here's one big patch. I have updated the behavior of close(), so that 

... it matches the behavior of 2.x.

> As a side-effect, this make __next__ raises a
ValueError, instead of StopIteration.

... when the file is closed.
msg59503 - (view) Author: Alexandre Vassalotti (alexandre.vassalotti) * (Python committer) Date: 2008-01-07 22:22
I got a patch that also fixes the profiler tests. That was easy after
all. :-)
msg64020 - (view) Author: Alexandre Vassalotti (alexandre.vassalotti) * (Python committer) Date: 2008-03-19 00:08
Here is a patch against the latest trunk (r61578) that includes the
accelerator module of io.BytesIO with its test suite. The patch also
changes the behavior of the truncate method to imply a seek(). Please
review!
msg64664 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2008-03-29 01:24
Bumping priority so that this gets reviewed before next release.
msg64739 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-03-30 01:15
From a quick glance: I may be wrong, but it looks like in
bytesio_writelines(), you'll have some reference leaks if the `while`
loop exits early because bytesio_write() returns NULL; `item` and `it`
should be decred'ed before returning.
msg64741 - (view) Author: Alexandre Vassalotti (alexandre.vassalotti) * (Python committer) Date: 2008-03-30 05:02
Yes, that was a leak. It should be fixed now.

Merci Antoine!
msg64755 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-03-30 18:07
Alexandre, is it normal that the unit tests look much less complete in
your latest patch than they were in the previous one?
Also, they don't test the Python fallback implementation anymore.
msg64757 - (view) Author: Alexandre Vassalotti (alexandre.vassalotti) * (Python committer) Date: 2008-03-30 18:43
Oops, I forgot to include the unit tests, with "svn add", when I made
the diff.
msg64758 - (view) Author: Alexandre Vassalotti (alexandre.vassalotti) * (Python committer) Date: 2008-03-30 18:52
There is a small bug in the last patch I posted. The unit tests assumed
(wrongly) that accelerator module for io.StringIO (i.e., _stringio) was
present.
msg64762 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-03-30 20:29
Ok I took a detailed look at _bytesio.c.

In write_bytes() there is the following resizing logic:

    if (self->pos + len > self->string_size) {
        if (resize_buffer(self, self->pos + len) < 0)
            return -1;
    }

Replacing `self->string_size` with `self->buf_size` should avoid some
spurious reallocations.

For some reason, using the help() function on a BytesIO instance does
not display the class help.

Overall, the new module looks fine. I can't say anything about the io.py
or _fileio.c fixes.
msg64763 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-03-30 20:48
One last thing: you probably noticed it, but there is one test which
needs correcting in test_StringIO. It's due to the fact that calling
next() on a closed BytesIO object raises ValueError now rather than
StopIteration.
msg64766 - (view) Author: Alexandre Vassalotti (alexandre.vassalotti) * (Python committer) Date: 2008-03-30 22:05
I think that check in write_bytes() is a guard to avoid resize_buffer()
from truncating the string stored in the buffer. However, I am not sure
if it is still necessary.

I don't know why help() doesn't work on BytesIO instances. But, the
problem doesn't seems to be in my code, since `help(sys.stdin)` doesn't
work either.

Finally regarding test_StringIO, see msg59460. Anyway, test_StringIO is
superseded by test_memoryio included my patch and should be removed from
the stdlib.
msg64769 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-03-30 22:52
Well, by construction self->buf_size should always be greater than
self->string_size, so I don't think there's any risk of truncating
anything here.
I tried the change, and the tests still ran fine.
msg65444 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2008-04-13 13:30
I'm going to review the patch later. How are we going to back port the
stuff to 2.6?
msg65446 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2008-04-13 14:12
Hey Alexandre!

The latest patch doesn't apply cleanly. Please provide a new patch.
msg65452 - (view) Author: Alexandre Vassalotti (alexandre.vassalotti) * (Python committer) Date: 2008-04-13 20:54
Here's an updated patch.
msg65453 - (view) Author: Alexandre Vassalotti (alexandre.vassalotti) * (Python committer) Date: 2008-04-13 20:58
Oops, I forgot again to "svn add" the new files.
msg65474 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2008-04-14 18:33
Do I need to look at this, or is the review going well without my
interference?
msg65476 - (view) Author: Alexandre Vassalotti (alexandre.vassalotti) * (Python committer) Date: 2008-04-14 18:53
Hi Guido,

The patch changes a minor things to io.py to allow io.BytesIO to pass my
test suite, so you may want to check it out. Other than that, I think
the review is going fine.
msg66308 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-05-06 09:00
Since nothing happened for almost one month here, perhaps it would be
better to merge Alexandre's changes as they are? We'll see quickly if
unexpected things happen :-)

It would also allow to progress on other issues, e.g. #2523.

Also, I stand by the suggestion I made about the resizing logic, but it
shouldn't be a showstopper either. We can improve that later.
msg66325 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2008-05-06 18:06
Alexandre, you have until the end of the week to commit it or someone else 
will. =) I don't want this going in so close to the alphas, but it should 
go in almost immediately afterwards.
msg66327 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-05-06 18:15
By the way, is someone already working on a C-accelerated TextIO?
msg66328 - (view) Author: Alexandre Vassalotti (alexandre.vassalotti) * (Python committer) Date: 2008-05-06 20:03
Patch committed in r62778.

Antoine wrote:
> Also, I stand by the suggestion I made about the resizing logic, but it
> shouldn't be a showstopper either. We can improve that later.

I made your suggested change to the resizing logic. Thanks again for the
review! 

> By the way, is someone already working on a C-accelerated TextIO?

Yes. I was waiting to commit the C optimization for io.BytesIO, before
continuing the one for io.StringIO. The implementation is mostly
done---it just needs to be updated with respect to the recent changes to
TextIO.
History
Date User Action Args
2008-05-06 20:03:15alexandre.vassalottisetstatus: open -> closed
assignee: alexandre.vassalotti
messages: + msg66328
resolution: accepted
2008-05-06 18:15:05pitrousetmessages: + msg66327
2008-05-06 18:06:55brett.cannonsetmessages: + msg66325
2008-05-06 09:00:57pitrousetmessages: + msg66308
2008-04-14 18:53:38alexandre.vassalottisetmessages: + msg65476
2008-04-14 18:33:33gvanrossumsetmessages: + msg65474
2008-04-13 20:58:51alexandre.vassalottisetfiles: - bytesio+misc-fixes-7.patch
2008-04-13 20:58:45alexandre.vassalottisetfiles: + bytesio+misc-fixes-8.patch
messages: + msg65453
2008-04-13 20:54:56alexandre.vassalottisetfiles: + bytesio+misc-fixes-7.patch
messages: + msg65452
2008-04-13 14:12:15christian.heimessetmessages: + msg65446
2008-04-13 13:30:07christian.heimessetnosy: + christian.heimes
messages: + msg65444
2008-04-09 11:23:02pitrousettype: performance
2008-03-30 22:52:11pitrousetmessages: + msg64769
2008-03-30 22:05:04alexandre.vassalottisetmessages: + msg64766
2008-03-30 20:48:24pitrousetmessages: + msg64763
2008-03-30 20:29:51pitrousetmessages: + msg64762
2008-03-30 18:52:22alexandre.vassalottisetfiles: + bytesio+misc-fixes-6.patch
messages: + msg64758
2008-03-30 18:43:22alexandre.vassalottisetfiles: + bytesio+misc-fixes-5.patch
messages: + msg64757
2008-03-30 18:07:52pitrousetmessages: + msg64755
2008-03-30 05:02:34alexandre.vassalottisetfiles: + bytesio+misc-fixes-4.patch
messages: + msg64741
2008-03-30 04:59:29alexandre.vassalottisetfiles: - io-misc-fixes.patch
2008-03-30 04:59:24alexandre.vassalottisetfiles: - truncate-semantic-change.patch
2008-03-30 04:59:20alexandre.vassalottisetfiles: - remove-old-stringio-test.patch
2008-03-30 04:59:12alexandre.vassalottisetfiles: - test_memoryio.py
2008-03-30 04:59:07alexandre.vassalottisetfiles: - swap-initstdio-initsite.patch
2008-03-30 04:59:01alexandre.vassalottisetfiles: - add-bytesio-setup.patch
2008-03-30 04:58:56alexandre.vassalottisetfiles: - _bytesio.c
2008-03-30 01:15:14pitrousetnosy: + pitrou
messages: + msg64739
2008-03-29 01:24:36georg.brandlsetpriority: normal -> critical
nosy: + georg.brandl
messages: + msg64664
2008-03-19 00:08:23alexandre.vassalottisetfiles: + bytesio+misc-fixes-3.patch
messages: + msg64020
2008-01-07 22:22:53alexandre.vassalottisetfiles: - bytesio+misc-fixes.patch
2008-01-07 22:22:30alexandre.vassalottisetfiles: + bytesio+misc-fixes-2.patch
messages: + msg59503
2008-01-07 16:21:43alexandre.vassalottisetmessages: + msg59461
2008-01-07 16:19:13alexandre.vassalottisetfiles: + bytesio+misc-fixes.patch
messages: + msg59460
2008-01-07 15:06:55gvanrossumsetmessages: + msg59447
2008-01-07 07:28:09alexandre.vassalottisetnosy: + brett.cannon
2008-01-07 07:25:09alexandre.vassalottisetfiles: + io-misc-fixes.patch
2008-01-07 07:24:08alexandre.vassalottisetfiles: + truncate-semantic-change.patch
2008-01-07 07:23:30alexandre.vassalottisetfiles: + remove-old-stringio-test.patch
2008-01-07 07:23:03alexandre.vassalottisetfiles: + test_memoryio.py
2008-01-07 07:22:38alexandre.vassalottisetfiles: + swap-initstdio-initsite.patch
2008-01-07 07:21:11alexandre.vassalottisetfiles: + add-bytesio-setup.patch
2008-01-07 07:20:30alexandre.vassalotticreate