This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author doerwalter
Recipients
Date 2004-12-03.20:01:00
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to
Content
Logged In: YES 
user_id=89016

> I checked the decoding_fgets function (and the enclosed 
call
> to fp_readl). The patch is more problematic than i thought
> since decoding_fgets not only takes a pointer to the token
> state but also a pointer to a destination string buffer.
> Reallocating the buffer within fp_readl would mean a very
> very bad hack since you'd have to reallocate "foreign"
> memory based on a pointer comparison (comparing the 
passed
> string buffers pointer against tok->inp || tok->buf...).

Maybe all pointers pointing into the buffer should be moved
into a struct?

> As it stands now, patching the tokenizer would mean 
changing
> the function signatures or otherwise change the structure
> (more error prone).

All the affected function seem to be static, so at least in
this regard there shouldn't be any problem.

> Another possible solution would be to
> provide a specialized readline() function for which the
> assumption that at most size bytes are returned is correct.

All the codecs would have to provide such a readline().

BTW, the more I look at your patch the more I think
that it gets us as close to the old behaviour as we
can get.

> Oh and about that UTF-8 decoding. readline()'s size
> restriction works on the already decoded string (at least it
> should), so that shouldn't be an issue.

I don't understand that. fp_readl() does the following
two calls:

buf = PyObject_Call(tok->decoding_readline, args, NULL);
utf8 = PyUnicode_AsUTF8String(buf);

and puts the resulting byte string into the char * passed
in, so even if we fix the readline call the UTF-8 encoded
string might still overflow the avaliable space. How can
tokenizer.c be sure how much the foo->utf8 transcoding
shrinks or expands the string?

> Maybe another
> optional parameter should be added to readline() called
> limit=None which doesn't limit the returned string by
> default, but does so if the parameter is a positive number.

But limit to what?
History
Date User Action Args
2007-08-23 14:28:04adminlinkissue1076985 messages
2007-08-23 14:28:04admincreate