This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author serhiy.storchaka
Recipients Artoria2e5, benjamin.peterson, eryksun, ezio.melotti, larry, ned.deily, paul.moore, serhiy.storchaka, steve.dower, tim.golden, vstinner, zach.ware
Date 2016-11-17.08:07:48
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1479370069.34.0.795859720719.issue28712@psf.upfronthosting.co.za>
In-reply-to
Content
Thank you Eryk. That is what I want. I just missed that code_page_decode() returns a tuple.

Seems Windows maps undefined codes to Unicode characters if they are in the range 0x80-0x9f and makes an error if they are outside of this range. But if the code starts multibyte sequence, the single byte is an error even if it is in the range 0x80-0x9f (codepages 932, 949, 950).

This could be emulated by either decoding with errors='surrogateescape' and postprocessing the result (replace '\udc80'-'\udc9f' with '\x80'-'\x9f' and handle '\udca0'-'\udcff' as error) or writing custom error handler that does the job (but perhaps needed several error handlers corresponding 'strict', 'replace', 'ignore', etc). Adding a new codec of cause is an option too.

There are few other minor differences between Python and Windows:

cp864: On Windows 0x25 is mapped to '%' (U+0025) instead of '٪' (U+066A).
cp932: 0xA0, 0xFD, 0xFE, 0xFF are errors instead of mapping to U+F8F0-U+F8F3.
cp1255: 0xCA is mapped to U+05BA instead of be undefined.

The first two differences can be handled by postprocessing, the latter needs changing the codec.
History
Date User Action Args
2016-11-17 08:07:49serhiy.storchakasetrecipients: + serhiy.storchaka, paul.moore, vstinner, larry, tim.golden, benjamin.peterson, ned.deily, ezio.melotti, zach.ware, eryksun, steve.dower, Artoria2e5
2016-11-17 08:07:49serhiy.storchakasetmessageid: <1479370069.34.0.795859720719.issue28712@psf.upfronthosting.co.za>
2016-11-17 08:07:49serhiy.storchakalinkissue28712 messages
2016-11-17 08:07:48serhiy.storchakacreate