This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vstinner
Recipients pitrou, vstinner
Date 2011-05-19.16:04:45
SpamBayes Score 6.132372e-08
Marked as misclassified No
Message-id <1305821086.44.0.699484480177.issue12116@psf.upfronthosting.co.za>
In-reply-to
Content
Example:

with open("setup.py", "rb") as f:
    # read smaller than the file size to fill the readahead buffer
    f.read(1)
    # seek doesn't seek
    f.seek(0)
    print("f pos=", f.tell())
    print("f.raw pos=", f.raw.tell())

Output:

f pos= 0
f.raw pos= 4096

I expect f.raw.tell() to be 0.

Extract of Modules/_io/buffered.c:

    if (whence != 2 && self->readable) {
        Py_off_t current, avail;
        /* Check if seeking leaves us inside the current buffer,
           so as to return quickly if possible. Also, we needn't take the
           lock in this fast path.
           Don't know how to do that when whence == 2, though. */
        /* NOTE: RAW_TELL() can release the GIL but the object is in a stable
           state at this point. */
        current = RAW_TELL(self);
        avail = READAHEAD(self);
        printf("current=%"  PY_PRIdOFF ", avail=%"  PY_PRIdOFF "\n", current, avail);
        if (avail > 0) {
            Py_off_t offset;
            if (whence == 0)
                offset = target - (current - RAW_OFFSET(self));
            else
                offset = target;
            printf("offset=%"  PY_PRIdOFF "\n", offset);
            if (offset >= -self->pos && offset <= avail) {
                printf("NO SEEK!\n");
                self->pos += offset;
                return PyLong_FromOff_t(current - avail + offset);
            }
        }
    }

I found this weird behaviour when trying to understand why:

        with open("setup.py", 'rb') as f:
            encoding, lines = tokenize.detect_encoding(f.readline)
        with open("setup.py", 'r', encoding=encoding) as f:
            imp.load_module("setup", f, "setup.py", (".py", "r", imp.PY_SOURCE))

is different than:

        with tokenize.open("setup.py") as f:
            imp.load_module("setup", f, "setup.py", (".py", "r", imp.PY_SOURCE))

imp.load_module() clones the file using something like fd = os.dup(f.fileno()); clone = os.fdopen(fd, "r").

For tokenizer.open(), a workaround is to replace:
   buffer.seek(0)
by
   buffer.seek(0); buffer.raw.seek(0)
History
Date User Action Args
2011-05-19 16:04:46vstinnersetrecipients: + vstinner, pitrou
2011-05-19 16:04:46vstinnersetmessageid: <1305821086.44.0.699484480177.issue12116@psf.upfronthosting.co.za>
2011-05-19 16:04:45vstinnerlinkissue12116 messages
2011-05-19 16:04:45vstinnercreate