Message 281134 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	vstinner
Recipients	ezio.melotti, serhiy.storchaka, vstinner, xiang.zhang
Date	2016-11-18.15:56:24
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1479484584.63.0.41418328566.issue28531@psf.upfronthosting.co.za>
In-reply-to

Content
Serhiy Storchaka: "The performance of the UTF-7 codec is not important." Right. "Actually I'm going to propose replacing it with Python implementation." Oh. Sadly, PyUnicode_DecodeUTF7() is part of the stable ABI. Do you want to call the Python codec from the C function for backward compatibility? I dislike UTF-7 because it's complex, but it's not as optimized as the UTF-8 codec, so the code remains not too big and so not too expensive to matain. "This encoder was omitted form _PyBytesWriter-using optimizations for purpose." Ah? I don't recall that. When I wrote _PyBytesWriter, I skipped UTF-7 because I don't know well this codec and I preferred to keep the code unchanged to avoid bugs :-) "The patch complicates the implementation." Hum, I have to disagree. For me, the patched new is no more complex than the current code. The main change is that it adds code checking the kind to better estimate the output length. It's not hard to understand the link between the Unicode kind of the max_char_size. I vote +1 on this patch because I consider that it makes the code simpler, not because it makes the codec faster (I don't really care of UTF-7 codec performance). But again (as in issue #28398), it's up to you Serhiy: I'm also ok to leave the code unchanged if you are against the patch.

Serhiy Storchaka: "The performance of the UTF-7 codec is not important."

Right.


"Actually I'm going to propose replacing it with Python implementation."

Oh. Sadly, PyUnicode_DecodeUTF7() is part of the stable ABI. Do you want to call the Python codec from the C function for backward compatibility?

I dislike UTF-7 because it's complex, but it's not as optimized as the UTF-8 codec, so the code remains not too big and so not too expensive to matain.


"This encoder was omitted form _PyBytesWriter-using optimizations for purpose."

Ah? I don't recall that. When I wrote _PyBytesWriter, I skipped UTF-7 because I don't know well this codec and I preferred to keep the code unchanged to avoid bugs :-)


"The patch complicates the implementation."

Hum, I have to disagree. For me, the patched new is no more complex than the current code. The main change is that it adds code checking the kind to better estimate the output length. It's not hard to understand the link between the Unicode kind of the max_char_size.


I vote +1 on this patch because I consider that it makes the code simpler, not because it makes the codec faster (I don't really care of UTF-7 codec performance).

But again (as in issue #28398), it's up to you Serhiy: I'm also ok to leave the code unchanged if you are against the patch.

History
Date	User	Action	Args
2016-11-18 15:56:24	vstinner	set	recipients: + vstinner, ezio.melotti, serhiy.storchaka, xiang.zhang
2016-11-18 15:56:24	vstinner	set	messageid: <1479484584.63.0.41418328566.issue28531@psf.upfronthosting.co.za>
2016-11-18 15:56:24	vstinner	link	issue28531 messages
2016-11-18 15:56:24	vstinner	create