Message 281872 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	serhiy.storchaka
Recipients	docs@python, ezio.melotti, lemburg, serhiy.storchaka, vstinner, xiang.zhang
Date	2016-11-28.12:51:41
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<2044899.0GaGWQQ6m7@xarax>
In-reply-to	<583C1AD7.7020206@egenix.com>

Content
> The only part that is not correct is "single string characters". > This should read "single bytes" or "bytes strings of length 1". This is not correct. Decoding mappings map not bytes strings, but integers. And this is not the only incorrect part. Decoding mappings can map to multicharacter Unicode strings, not to single Unicode characters. Not just None, but the integer 0xfffe and Unicode string '\ufffe' mean "undefined mapping". There are similar incorrectnesses about encoding mappings. > I also don't see where you copied the description. Without some > description of what "mappings" are in the context of the charmap > codec, it's not easy to understand what the purpose of these > APIs is. Please just fix the bytes wording instead of removing the > whole intro. Decoding mappings were desribed in the introduction and in the description of PyUnicode_DecodeCharmap() (both are outdated and incomplete). I merged and corrected descriptions and left it only in one place, since PyUnicode_DecodeCharmap() is the only function that needs this. Same for encoding mappings. Both decoding and encoding mappings do not have a relation to PyUnicode_Translate(). The paragraph about a LookupError in the introduction was totally wrong. I left in the introduction only common part. Other details are too different in decoding, encoding and translation mappings. > >> Also, this wording needs to be corrected: "bytes (integers in the range > >> from 0 to 255)". Bytes are not integers. I'd suggest to use the more > >> correct wording "bytes strings of length 1".> > > The word "bytes" means here not Python bytes object, but is used in more > > common meaning: an integer in the range from 0 to 255. > That's confusing, since we use the term "bytes" as referring > to the bytes object in Python. Please use "integers in the range > 0-255". Okay, I'll remove the word "bytes" here. But how would you formulate the following sentence: "Unmapped bytes (ones which cause a :exc:`LookupError`) as well as mapped to ``None``, ``0xFFFE`` or ``'\ufffe'`` are treated as "undefined mapping" and cause an error."? > Aside: The deprecation of PyUnicode_EncodeCharmap() also seems misplaced > in this context, since only the Py_UNICODE version of the API is > deprecated. The functionality still exists and is useful. An API > similar to the _PyUnicode_EncodeCharmap() API should be made publicly > available to accommodate for the deprecation, since the mentioned > PyUnicode_AsCharmapString() and PyUnicode_AsEncodedString() > APIs are not suitable as replacement. PyUnicode_AsCharmapString() > doesn't support error handling (strange, BTW) and > PyUnicode_AsEncodedString() has a completely unrelated meaning (no > idea why it's mentioned here at all). Only PyUnicode_EncodeCharmap() is deprecated, PyUnicode_AsCharmapString() is not deprecated. I placed the deprecated function just after its non-deprecated counerpart following the pattern for other deprecated functions. If you prefer I'll move both deprecated functions (PyUnicode_EncodeCharmap and PyUnicode_TranslateCharmap) together at the end of this section. I don't know why PyUnicode_AsCharmapString() don't support the errors argument. I added PyUnicode_AsEncodedString() as a replacement (issue19569) because this is the only public non-deprecated way to do a charmap encoding with errors handling. There is no exact equivalent, but PyUnicode_AsCharmapString() and PyUnicode_AsEncodedString() cover different areas of using PyUnicode_EncodeCharmap().

> The only part that is not correct is "single string characters".
> This should read "single bytes" or "bytes strings of length 1".

This is not correct. Decoding mappings map not bytes strings, but integers. 
And this is not the only incorrect part. Decoding mappings can map to 
multicharacter Unicode strings, not to single Unicode	 characters. Not just 
None, but the integer 0xfffe and Unicode string '\ufffe' mean "undefined 
mapping".

There are similar incorrectnesses about encoding mappings.

> I also don't see where you copied the description. Without some
> description of what "mappings" are in the context of the charmap
> codec, it's not easy to understand what the purpose of these
> APIs is. Please just fix the bytes wording instead of removing the
> whole intro.

Decoding mappings were desribed in the introduction and in the description of 
PyUnicode_DecodeCharmap() (both are outdated and incomplete). I merged and 
corrected descriptions and left it only in one place, since 
PyUnicode_DecodeCharmap() is the only function that needs this. Same for 
encoding mappings. Both decoding and encoding mappings do not have a relation 
to PyUnicode_Translate(). The paragraph about a LookupError in the 
introduction was totally wrong. I left in the introduction only common part. 
Other details are too different in decoding, encoding and translation mappings.

> >> Also, this wording needs to be corrected: "bytes (integers in the range
> >> from 0 to 255)". Bytes are not integers. I'd suggest to use the more
> >> correct wording "bytes strings of length 1".> 
> > The word "bytes" means here not Python bytes object, but is used in more
> > common meaning: an integer in the range from 0 to 255.
> That's confusing, since we use the term "bytes" as referring
> to the bytes object in Python. Please use "integers in the range
> 0-255".

Okay, I'll remove the word "bytes" here.  But how would you formulate the 
following sentence: "Unmapped bytes (ones which cause a :exc:`LookupError`) as 
well as mapped to ``None``, ``0xFFFE`` or ``'\ufffe'`` are treated as "undefined 
mapping" and cause an error."?

> Aside: The deprecation of PyUnicode_EncodeCharmap() also seems misplaced
> in this context, since only the Py_UNICODE version of the API is
> deprecated. The functionality still exists and is useful. An API
> similar to the _PyUnicode_EncodeCharmap() API should be made publicly
> available to accommodate for the deprecation, since the mentioned
> PyUnicode_AsCharmapString() and PyUnicode_AsEncodedString()
> APIs are not suitable as replacement. PyUnicode_AsCharmapString()
> doesn't support error handling (strange, BTW) and
> PyUnicode_AsEncodedString() has a completely unrelated meaning (no
> idea why it's mentioned here at all).

Only PyUnicode_EncodeCharmap() is deprecated, PyUnicode_AsCharmapString() is 
not deprecated. I placed the deprecated function just after its non-deprecated 
counerpart following the pattern for other deprecated functions. If you prefer 
I'll move both deprecated functions (PyUnicode_EncodeCharmap and 
PyUnicode_TranslateCharmap) together at the end of this section.

I don't know why PyUnicode_AsCharmapString() don't support the errors 
argument. I added PyUnicode_AsEncodedString() as a replacement (issue19569) 
because this is the only public non-deprecated way to do a charmap encoding 
with errors handling. There is no exact equivalent, but 
PyUnicode_AsCharmapString() and PyUnicode_AsEncodedString() cover different 
areas of using PyUnicode_EncodeCharmap().

History
Date	User	Action	Args
2016-11-28 12:51:42	serhiy.storchaka	set	recipients: + serhiy.storchaka, lemburg, vstinner, ezio.melotti, docs@python, xiang.zhang
2016-11-28 12:51:42	serhiy.storchaka	link	issue28749 messages
2016-11-28 12:51:41	serhiy.storchaka	create