This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author glchapman
Recipients
Date 2005-04-23.16:46:40
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to
Content
Logged In: YES 
user_id=86307

> 1) How do we handle the problem of a truncated line, if the
> data comes from the charbuffer instead of being read from
> the stream?

My suggestion is to make the top of the loop look like:

    while True:
        havecharbuffer = bool(self.charbuffer)

And then the break condition (when no line break found)
should be:

    # we didn't get anything or this was our only try
    if not data or (size is not None and not havecharbuffer):

(too many negatives in that).  Anyway, the idea is that, if
size specified, there will be at most one read of the
underlying stream (using the size).  So if you enter the
loop with a charbuffer, and that charbuffer does not have a
line break, then you redo the loop once (the second time it
will break, because havecharbuffer will be False).

Also, not sure about this, but should the size parameter
default to -1 (to keep it in sync with read)?

As to issue 2, it looks like it should be possible to get
the line number right, because the UnicodeDecodeError
exception object has all the necessary information in it
(including the original string).  I think this should be
done by fp_readl (in tokenizer.c).  

By the way, using a findlinebreak function (using sre) turns
out to be slower than splitting/joining when no size is
specified (which I assume will be the overwhelmingly most
common case), so that turned out to be a bad idea on my part.
History
Date User Action Args
2007-08-23 14:30:41adminlinkissue1175396 messages
2007-08-23 14:30:41admincreate