Message 402007 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	lemburg
Recipients	eryksun, ezio.melotti, lemburg, paul.moore, python-dev, rafaelblsilva, serhiy.storchaka, steve.dower, tim.golden, vstinner, zach.ware
Date	2021-09-17.07:34:38
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1631864078.28.0.775823855831.issue45120@roundup.psfhosted.org>
In-reply-to

Content
Just to be clear: The Python code page encodings are (mostly) taken from the unicode.org set of mappings (ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/). This is our standards body for such mappings, where possible. In some cases, the Unicode consortium does not provide such mappings and we resort to other standards (ISO, commonly used mapping files in OSes, Wikipedia, etc). Changes to the existing mapping codecs should only be done in case corrections are applied to the mappings under those names by the standard bodies. If you want to add variants such as the best fit ones from MS, we'd have to add them under a different name, e.g. bestfit1252 (see ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/). Otherwise, interop with other systems would no longer. From Eryk's description it sounds like we should always add WC_NO_BEST_FIT_CHARS as an option to MultiByteToWideChar() in order to make sure it doesn't use best fit variants unless explicitly requested.

Just to be clear: The Python code page encodings are (mostly) taken from the unicode.org set of mappings (ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/). This is our standards body for such mappings, where possible. In some cases, the Unicode consortium does not provide such mappings and we resort to other standards (ISO, commonly used mapping files in OSes, Wikipedia, etc).

Changes to the existing mapping codecs should only be done in case corrections are applied to the mappings under those names by the standard bodies.

If you want to add variants such as the best fit ones from MS, we'd have to add them under a different name, e.g. bestfit1252 (see ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/).

Otherwise, interop with other systems would no longer.

From Eryk's description it sounds like we should always add WC_NO_BEST_FIT_CHARS as an option to MultiByteToWideChar() in order to make sure it doesn't use best fit variants unless explicitly requested.

History
Date	User	Action	Args
2021-09-17 07:34:38	lemburg	set	recipients: + lemburg, paul.moore, vstinner, tim.golden, ezio.melotti, python-dev, zach.ware, serhiy.storchaka, eryksun, steve.dower, rafaelblsilva
2021-09-17 07:34:38	lemburg	set	messageid: <1631864078.28.0.775823855831.issue45120@roundup.psfhosted.org>
2021-09-17 07:34:38	lemburg	link	issue45120 messages
2021-09-17 07:34:38	lemburg	create