Example:
with open("setup.py", "rb") as f:
# read smaller than the file size to fill the readahead buffer
f.read(1)
# seek doesn't seek
f.seek(0)
print("f pos=", f.tell())
print("f.raw pos=", f.raw.tell())
Output:
f pos= 0
f.raw pos= 4096
I expect f.raw.tell() to be 0.
Extract of Modules/_io/buffered.c:
if (whence != 2 && self->readable) {
Py_off_t current, avail;
/* Check if seeking leaves us inside the current buffer,
so as to return quickly if possible. Also, we needn't take the
lock in this fast path.
Don't know how to do that when whence == 2, though. */
/* NOTE: RAW_TELL() can release the GIL but the object is in a stable
state at this point. */
current = RAW_TELL(self);
avail = READAHEAD(self);
printf("current=%" PY_PRIdOFF ", avail=%" PY_PRIdOFF "\n", current, avail);
if (avail > 0) {
Py_off_t offset;
if (whence == 0)
offset = target - (current - RAW_OFFSET(self));
else
offset = target;
printf("offset=%" PY_PRIdOFF "\n", offset);
if (offset >= -self->pos && offset <= avail) {
printf("NO SEEK!\n");
self->pos += offset;
return PyLong_FromOff_t(current - avail + offset);
}
}
}
I found this weird behaviour when trying to understand why:
with open("setup.py", 'rb') as f:
encoding, lines = tokenize.detect_encoding(f.readline)
with open("setup.py", 'r', encoding=encoding) as f:
imp.load_module("setup", f, "setup.py", (".py", "r", imp.PY_SOURCE))
is different than:
with tokenize.open("setup.py") as f:
imp.load_module("setup", f, "setup.py", (".py", "r", imp.PY_SOURCE))
imp.load_module() clones the file using something like fd = os.dup(f.fileno()); clone = os.fdopen(fd, "r").
For tokenizer.open(), a workaround is to replace:
buffer.seek(0)
by
buffer.seek(0); buffer.raw.seek(0)
|