This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: io-c: TextIOWrapper is faster than BufferedReader but not protected by a lock
Type: performance Stage: needs patch
Components: Library (Lib) Versions: Python 3.1
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: pitrou Nosy List: pitrou, vstinner
Priority: normal Keywords: patch

Created on 2009-03-18 01:16 by vstinner, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
crash_textiowrapper.py vstinner, 2009-03-18 01:59
speedup-bufio.patch pitrou, 2009-04-06 22:16
Messages (5)
msg83724 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2009-03-18 01:16
TextIOWrapper.readline() is much faster (eg. 72 ms vs 95 ms) than 
BufferedReader.readline(). It's because BufferedReader always acquires 
the file lock, whereas TextIOWrapper only acquires the file lock when 
the buffer is empty.

I would like a BufferedReader.readline() as fast as 
TextIOWrapper.readline(), or faster!

Why BufferedReader's attributes are protected by a lock whereas 
TextIOWrapper's attributes are not?

Does it mean that TextIOWrapper may crash if two threads calls 
readline() (or different methods) at the "same time"?

How does Python 2.x and 3.0 fix this issue?
msg83728 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2009-03-18 01:59
I wrote a short script to test TextIOWrapper.readline() with 32 
threads. After 5 seconds, I found this issue in Python trunk (2.7):

Exception in thread Thread-26:
Traceback (most recent call last):
  File "/home/SHARE/SVN/python-trunk/Lib/threading.py", line 522, in 
__bootstrap_inner
    self.run()
  File "/home/haypo/crash_textiowrapper.py", line 15, in run
    line = self.file.readline()
  File "/home/SHARE/SVN/python-trunk/Lib/io.py", line 1835, in 
readline
    self._rewind_decoded_chars(len(line) - endpos)
  File "/home/SHARE/SVN/python-trunk/Lib/io.py", line 1541, in 
_rewind_decoded_chars
    raise AssertionError("rewind decoded_chars out of bounds")
AssertionError: rewind decoded_chars out of bounds

But it looks that py3k is stronger because it doesn't crash. Is it the 
power of the GIL?
msg83739 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-03-18 10:35
> But it looks that py3k is stronger because it doesn't crash. Is it the 
> power of the GIL?

Yes, it is.
In theory, we needn't take the lock in all of BufferedReader.readline(),
only when calling external code which might itself release the GIL. In
practice, we didn't bother optimizing the lock-taking, for the sake of
simplicity. If the lock really accounts for a significant part of the
runtime cost, we can try to do better.
msg85674 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-04-06 22:16
Here is a patch which provides a significant speedup (up to 30%) on
small operations (small reads, iteration) on binary files. Please test.
msg85865 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-04-11 15:39
Committed in r71483.
History
Date User Action Args
2022-04-11 14:56:46adminsetgithub: 49752
2009-04-11 15:39:45pitrousetstatus: open -> closed
resolution: fixed
messages: + msg85865
2009-04-06 22:16:56pitrousetfiles: + speedup-bufio.patch
keywords: + patch
messages: + msg85674
2009-03-22 13:46:09pitrousetpriority: normal
assignee: pitrou
type: performance
stage: needs patch
2009-03-18 10:35:38pitrousetmessages: + msg83739
2009-03-18 01:59:31vstinnersetfiles: + crash_textiowrapper.py

messages: + msg83728
2009-03-18 01:16:11vstinnercreate