Message 120982 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	belopolsky
Recipients	belopolsky, ezio.melotti, lemburg, loewis, vstinner
Date	2010-11-11.23:05:52
SpamBayes Score	5.483061e-08
Marked as misclassified	No
Message-id	<1289516754.09.0.284658362081.issue10382@psf.upfronthosting.co.za>
In-reply-to

Content
haypo> See also #2382: I wrote patches two years ago for this issue. Yes, this is the same issue. I don't want to close this as a duplicate because #2382 contains a much more ambitious set of patches. What I am trying to achieve here is similar to the adjust_offset.patch there. I am attaching a patch that takes an alternative approach and computes the number of characters in the parser. I strongly believe that the buffer in the tokenizer always contains UTF-8 encoded text. If it is not so already, I would consider making it so by replacing a call to _PyUnicode_AsDefaultEncodedString() with a call to PyUnicode_AsUTF8String(). (if that matters) The patch still needs unittests and possibly has some off-by-one issues, but I would like to get to an agreement that this is the right level at which the problem should be fixed first.

haypo> See also #2382: I wrote patches two years ago for this issue.

Yes, this is the same issue.  I don't want to close this as a duplicate because #2382 contains a much more ambitious set of patches.  What I am trying to achieve here is similar to the adjust_offset.patch there.

I am attaching a patch that takes an alternative approach and computes the number of characters in the parser.  I strongly believe that the buffer in the tokenizer always contains UTF-8 encoded text.  If it is not so already, I would consider making it so by replacing a call to _PyUnicode_AsDefaultEncodedString() with a call to PyUnicode_AsUTF8String(). (if that matters)

The patch still needs unittests and possibly has some off-by-one issues, but I would like to get to an agreement that this is the right level at which the problem should be fixed first.

History
Date	User	Action	Args
2010-11-11 23:05:54	belopolsky	set	recipients: + belopolsky, lemburg, loewis, vstinner, ezio.melotti
2010-11-11 23:05:54	belopolsky	set	messageid: <1289516754.09.0.284658362081.issue10382@psf.upfronthosting.co.za>
2010-11-11 23:05:52	belopolsky	link	issue10382 messages
2010-11-11 23:05:52	belopolsky	create