This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author rafaelblsilva
Recipients ezio.melotti, lemburg, paul.moore, rafaelblsilva, steve.dower, tim.golden, vstinner, zach.ware
Date 2021-09-06.20:30:12
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1630960212.43.0.654084937143.issue45120@roundup.psfhosted.org>
In-reply-to
Content
There is a mismatch in specification and behavior in some windows encodings.

Some older windows codepages specifications present "UNDEFINED" mapping, whereas in reality, they present another behavior which is updated in a section named "bestfit".

For example CP1252 has a corresponding bestfit1525: 
CP1252: https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1252.TXT
bestfit1525: https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1252.txt


From which, in CP1252, bytes \x81 \x8d \x8f \x90 \x9d map to "UNDEFINED", whereas in bestfit1252, they map to \u0081 \u008d \u008f \u0090 \u009d respectively. 

In the Windows API, the function 'MultiByteToWideChar' exhibits the bestfit1252 behavior.


This issue and PR proposes a correction for this behavior, updating the windows codepages where some code points where defined as "UNDEFINED" to the corresponding bestfit mapping. 


Related issue: https://bugs.python.org/issue28712
History
Date User Action Args
2021-09-06 20:30:12rafaelblsilvasetrecipients: + rafaelblsilva, lemburg, paul.moore, vstinner, tim.golden, ezio.melotti, zach.ware, steve.dower
2021-09-06 20:30:12rafaelblsilvasetmessageid: <1630960212.43.0.654084937143.issue45120@roundup.psfhosted.org>
2021-09-06 20:30:12rafaelblsilvalinkissue45120 messages
2021-09-06 20:30:12rafaelblsilvacreate