This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author sjmachin
Recipients sjmachin
Date 2010-04-03.23:40:15
SpamBayes Score 0.012598876
Marked as misclassified No
Message-id <1270338018.63.0.0227454811381.issue8308@psf.upfronthosting.co.za>
In-reply-to
Content
According to the following references, the bytes 80, A0, FD, FE, and FF are not defined in cp932:

http://msdn.microsoft.com/en-au/goglobal/cc305152.aspx
http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP932.TXT
http://demo.icu-project.org/icu-bin/convexp?conv=ibm-943_P15A-2003&s=ALL

However CPython 3.1.2 does this:

 >>> print(ascii(b'\x80\xa0\xfd\xfe\xff'.decode('cp932')))
 '\x80\uf8f0\uf8f1\uf8f2\uf8f3'

(as do 2.5, 2.6. and 2.7 with the appropriate syntax)

This maps 80 to U+0080 (not very useful) and maps the other 4 bytes into the Private Use Area ("PUA")!! Each case should be treated as undefined/unexpected/error/...
History
Date User Action Args
2010-04-03 23:40:18sjmachinsetrecipients: + sjmachin
2010-04-03 23:40:18sjmachinsetmessageid: <1270338018.63.0.0227454811381.issue8308@psf.upfronthosting.co.za>
2010-04-03 23:40:16sjmachinlinkissue8308 messages
2010-04-03 23:40:15sjmachincreate