Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

io-c: TextIOWrapper is faster than BufferedReader but not protected by a lock #49752

Closed
vstinner opened this issue Mar 18, 2009 · 5 comments
Closed
Assignees
Labels
performance Performance or resource usage stdlib Python modules in the Lib dir

Comments

@vstinner
Copy link
Member

BPO 5502
Nosy @pitrou, @vstinner
Files
  • crash_textiowrapper.py
  • speedup-bufio.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/pitrou'
    closed_at = <Date 2009-04-11.15:39:45.854>
    created_at = <Date 2009-03-18.01:16:11.259>
    labels = ['library', 'performance']
    title = 'io-c: TextIOWrapper is faster than BufferedReader but not protected by a lock'
    updated_at = <Date 2009-04-11.15:39:45.826>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2009-04-11.15:39:45.826>
    actor = 'pitrou'
    assignee = 'pitrou'
    closed = True
    closed_date = <Date 2009-04-11.15:39:45.854>
    closer = 'pitrou'
    components = ['Library (Lib)']
    creation = <Date 2009-03-18.01:16:11.259>
    creator = 'vstinner'
    dependencies = []
    files = ['13361', '13642']
    hgrepos = []
    issue_num = 5502
    keywords = ['patch']
    message_count = 5.0
    messages = ['83724', '83728', '83739', '85674', '85865']
    nosy_count = 2.0
    nosy_names = ['pitrou', 'vstinner']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'needs patch'
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue5502'
    versions = ['Python 3.1']

    @vstinner
    Copy link
    Member Author

    TextIOWrapper.readline() is much faster (eg. 72 ms vs 95 ms) than
    BufferedReader.readline(). It's because BufferedReader always acquires
    the file lock, whereas TextIOWrapper only acquires the file lock when
    the buffer is empty.

    I would like a BufferedReader.readline() as fast as
    TextIOWrapper.readline(), or faster!

    Why BufferedReader's attributes are protected by a lock whereas
    TextIOWrapper's attributes are not?

    Does it mean that TextIOWrapper may crash if two threads calls
    readline() (or different methods) at the "same time"?

    How does Python 2.x and 3.0 fix this issue?

    @vstinner vstinner added the stdlib Python modules in the Lib dir label Mar 18, 2009
    @vstinner
    Copy link
    Member Author

    I wrote a short script to test TextIOWrapper.readline() with 32
    threads. After 5 seconds, I found this issue in Python trunk (2.7):

    Exception in thread Thread-26:
    Traceback (most recent call last):
      File "/home/SHARE/SVN/python-trunk/Lib/threading.py", line 522, in 
    __bootstrap_inner
        self.run()
      File "/home/haypo/crash_textiowrapper.py", line 15, in run
        line = self.file.readline()
      File "/home/SHARE/SVN/python-trunk/Lib/io.py", line 1835, in 
    readline
        self._rewind_decoded_chars(len(line) - endpos)
      File "/home/SHARE/SVN/python-trunk/Lib/io.py", line 1541, in 
    _rewind_decoded_chars
        raise AssertionError("rewind decoded_chars out of bounds")
    AssertionError: rewind decoded_chars out of bounds

    But it looks that py3k is stronger because it doesn't crash. Is it the
    power of the GIL?

    @pitrou
    Copy link
    Member

    pitrou commented Mar 18, 2009

    But it looks that py3k is stronger because it doesn't crash. Is it the
    power of the GIL?

    Yes, it is.
    In theory, we needn't take the lock in all of BufferedReader.readline(),
    only when calling external code which might itself release the GIL. In
    practice, we didn't bother optimizing the lock-taking, for the sake of
    simplicity. If the lock really accounts for a significant part of the
    runtime cost, we can try to do better.

    @pitrou pitrou self-assigned this Mar 22, 2009
    @pitrou pitrou added the performance Performance or resource usage label Mar 22, 2009
    @pitrou
    Copy link
    Member

    pitrou commented Apr 6, 2009

    Here is a patch which provides a significant speedup (up to 30%) on
    small operations (small reads, iteration) on binary files. Please test.

    @pitrou
    Copy link
    Member

    pitrou commented Apr 11, 2009

    Committed in r71483.

    @pitrou pitrou closed this as completed Apr 11, 2009
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    performance Performance or resource usage stdlib Python modules in the Lib dir
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants