This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vstinner
Recipients benjamin.peterson, pitrou, serhiy.storchaka, vstinner
Date 2017-09-20.13:53:06
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1505915586.9.0.761941352727.issue31530@psf.upfronthosting.co.za>
In-reply-to
Content
In Python 3, reading ahead is implemented by _io.BufferedReader. This object uses a lock to provide a prevent race condition: it's not only to prevent crashes, but also provide warranties on how the file is read.

If thread A calls read() first, it gets the next bytes. If thread B calls read() while thread A is filling the internal file buffer ("readahead buffer"?), the second read is queued. The file position is only controlled by a single thread at the same time.

_PyOS_URandom() uses a similar strategy than Benjamin's proposed patch for the cached file descriptor of /dev/urandom:

    fd = _Py_open("/dev/urandom", O_RDONLY);
    if (fd < 0) {
        ...
        return -1;
    }
    if (urandom_cache.fd >= 0) {
        /* urandom_fd was initialized by another thread while we were
           not holding the GIL, keep it. */
        close(fd);
        fd = urandom_cache.fd;
    }
    else {
        ...
        urandom_cache.fd = fd;
    }

The difference is that opening /dev/urandom multiple times in parallel is safe, whereas reading from the same file descriptor in parellel... using the buffered fread()... is not safe. readahead() can require multiple fread() calls, so multiple read() syscalls. Interlaced reads in parallel is likely to return scrambled data.

Adding a lock in Python 2.7.15 can impact performances even on single threaded applications.

I'm not sure what whaters more here: performance or correctness?

Note: Even the awesome Python 3 io module has same flaws! https://bugs.python.org/issue12215 "TextIOWrapper: issues with interlaced read-write"

The question is more *who* reads from the same file object in parallel? Does it make sense? :-) Do you expect that file.read(n) is "atomic" in term of parallelism?

Note 2: the io module is also available in Python 2.7, just not used by default by the builtin open() function ;-) io.open() must be used explicitly.
History
Date User Action Args
2017-09-20 13:53:06vstinnersetrecipients: + vstinner, pitrou, benjamin.peterson, serhiy.storchaka
2017-09-20 13:53:06vstinnersetmessageid: <1505915586.9.0.761941352727.issue31530@psf.upfronthosting.co.za>
2017-09-20 13:53:06vstinnerlinkissue31530 messages
2017-09-20 13:53:06vstinnercreate