Message24864
Logged In: YES
user_id=89016
The current readline() is implemented so that even if the
complete line ending (i.e. '\r\n') can't be read within size
bytes, at least something that might work as a line end
(i.e. '\r') is available at the end of the string, so that
the user knowns that the line is complete. The atcr members
makes sure that the '\n' that is read next isn't
misinterpreted as another line. Unfortunately the import
mechanisn doesn't work that way: It demands a '\n' as line
terminator and will continue reading if it only finds the
'\r'. This means that the '\n' will be skipped and the
import mechanisn treats those two lines as one.
IMHO the best solution would be the read the extra character
even if size is None, as glchapman suggested, so the above
situation would really only happen if the last character
from the stream is '\r'. I think the tokenizer should be
able to survive this. What it didn't survive in 2.4 was that
readline() tried to get it's hand on a complete line even if
the length of the line was way bigger than the size passed
in. If we remove the "size is None" restriction from the
current code, then I think we should remove the atcr logic
too, because it could only happen under exceptional
circumstances and the old handling might be better for those
users that don't recognize '\r' as a line end.
But in any case the tokenizer should be fixed to be able to
handle any line length returned from readline(). I'd really
like to get a review by Martin v. Löwis of glchapman's patch
#1101726.
Of course the simplest solution would be: "If you want a
complete line, don't pass a size parameter to readline()". I
don't know if it would make sense to change the PEP263
machinery to support this.
The other problem is if readline() returns data from the
charbuffer. I don't really see how this can be fixed. We
might call read() with a size parameter even if there are
characters in the charbuffer, but what happens if the user
calls readline(size=100000) first and then
readline(size=10)? The first readline() call might put a
very long string into the charbuffer and then the second
call will return a unicode string whose length is in no way
correlated to the size parameter. (This happens even even
with the current code of course, so the message should be:
Don't trust the size parameter, it only tells the underlying
stream how many bytes to read (approximately), it's doesn't
tell you *anything* about the amount of data returned.).
This was different in 2.3.x, because the readline() method
of the underlying stream was used, which handled this
properly, but only worked if an encoded linefeed was
recognizable as a normal 8bit '\n' linefeed by the
underlying readline() (i.e. the UTF-16 encodings didn't work).
|
|
Date |
User |
Action |
Args |
2007-08-23 14:30:41 | admin | link | issue1175396 messages |
2007-08-23 14:30:41 | admin | create | |
|