This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author eryksun
Recipients Artoria2e5, benjamin.peterson, eryksun, ezio.melotti, larry, ned.deily, paul.moore, serhiy.storchaka, steve.dower, tim.golden, vstinner, zach.ware
Date 2016-11-17.02:25:02
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1479349503.02.0.826181136142.issue28712@psf.upfronthosting.co.za>
In-reply-to
Content
The ANSI and OEM codepages are conveniently supported on a Windows system as the encodings 'mbcs' and 'oem' (new in 3.6). The best-fit mapping is used by the 'replace' error handler (see the encode_code_page_flags function in Objects/unicodeobject.c). For other Windows codepages, while it's not as convenient, you can use codecs.code_page_encode. For example:

    >>> codecs.code_page_encode(1252, 'α', 'replace')
    (b'a', 1)

For decoding, MB_ERR_INVALID_CHARS has no effect on decoding single-byte codepages because they map every byte. It only affects decoding byte sequences that are invalid in multibyte codepages such as 932 and 65001. Without this flag, invalid sequences are silently decoded as the codepage's Unicode default character. This is usually "?", but for 932 it's Katakana middle dot (U+30FB), and for UTF-8 it's U+FFFD. codecs.code_page_decode uses MB_ERR_INVALID_CHARS almost always, except not for UTF-7 (see the decode_code_page_flags function). So its 'replace' error handling is completely Python's own implementation. For example:

MultiByteToWideChar without MB_ERR_INVALID_CHARS:

    >>> decode(932, b'\xe05', strict=False)
    '\u30fb'

versus code_page_decode:

    >>> codecs.code_page_decode(932, b'\xe05', 'replace', True)
    ('\ufffd5', 2)
History
Date User Action Args
2016-11-17 02:25:03eryksunsetrecipients: + eryksun, paul.moore, vstinner, larry, tim.golden, benjamin.peterson, ned.deily, ezio.melotti, zach.ware, serhiy.storchaka, steve.dower, Artoria2e5
2016-11-17 02:25:03eryksunsetmessageid: <1479349503.02.0.826181136142.issue28712@psf.upfronthosting.co.za>
2016-11-17 02:25:03eryksunlinkissue28712 messages
2016-11-17 02:25:02eryksuncreate