Message 197686 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	scoder
Recipients	amaury.forgeotdarc, doerwalter, lemburg, scoder, serhiy.storchaka
Date	2013-09-14.05:26:44
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1379136405.27.0.0528394382778.issue18059@psf.upfronthosting.co.za>
In-reply-to

Content
I don't think I have my head deep enough in the encodings implementation to say that this is the correct/best way to do it, but the patch looks mostly reasonable to me and would be a helpful addition. I have two comments on the pyexpat_encoding_convert() function: 1) I can't see a safe-guard against reading beyond the data buffer. What if s already points to the last byte and we are trying to read two or three bytes to decode them? I wouldn't be surprised to see that this kind of input can be crafted. 2) Creating a throw-away Unicode object through a named decoder looks like a huge overhead for decoding two bytes. It might be considered an optimisation to change that, but if you are really trying to parse a longer XML document with lots of Japanese text in it (i.e. if you actually need this feature), it will most likely end up being way too slow to make any real use of it. I think that both points should be addressed before this gets added.

I don't think I have my head deep enough in the encodings implementation to say that this is the correct/best way to do it, but the patch looks mostly reasonable to me and would be a helpful addition.

I have two comments on the pyexpat_encoding_convert() function:

1) I can't see a safe-guard against reading beyond the data buffer. What if s already points to the last byte and we are trying to read two or three bytes to decode them? I wouldn't be surprised to see that this kind of input can be crafted.

2) Creating a throw-away Unicode object through a named decoder looks like a huge overhead for decoding two bytes. It might be considered an optimisation to change that, but if you are really trying to parse a longer XML document with lots of Japanese text in it (i.e. if you actually *need* this feature), it will most likely end up being way too slow to make any real use of it.

I think that both points should be addressed before this gets added.

History
Date	User	Action	Args
2013-09-14 05:26:45	scoder	set	recipients: + scoder, lemburg, doerwalter, amaury.forgeotdarc, serhiy.storchaka
2013-09-14 05:26:45	scoder	set	messageid: <1379136405.27.0.0528394382778.issue18059@psf.upfronthosting.co.za>
2013-09-14 05:26:45	scoder	link	issue18059 messages
2013-09-14 05:26:44	scoder	create