Message197686
I don't think I have my head deep enough in the encodings implementation to say that this is the correct/best way to do it, but the patch looks mostly reasonable to me and would be a helpful addition.
I have two comments on the pyexpat_encoding_convert() function:
1) I can't see a safe-guard against reading beyond the data buffer. What if s already points to the last byte and we are trying to read two or three bytes to decode them? I wouldn't be surprised to see that this kind of input can be crafted.
2) Creating a throw-away Unicode object through a named decoder looks like a huge overhead for decoding two bytes. It might be considered an optimisation to change that, but if you are really trying to parse a longer XML document with lots of Japanese text in it (i.e. if you actually *need* this feature), it will most likely end up being way too slow to make any real use of it.
I think that both points should be addressed before this gets added. |
|
Date |
User |
Action |
Args |
2013-09-14 05:26:45 | scoder | set | recipients:
+ scoder, lemburg, doerwalter, amaury.forgeotdarc, serhiy.storchaka |
2013-09-14 05:26:45 | scoder | set | messageid: <1379136405.27.0.0528394382778.issue18059@psf.upfronthosting.co.za> |
2013-09-14 05:26:45 | scoder | link | issue18059 messages |
2013-09-14 05:26:44 | scoder | create | |
|