Message 175408 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	vstinner
Recipients	ezio.melotti, vstinner
Date	2012-11-11.23:34:24
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1352676864.76.0.63024120757.issue16455@psf.upfronthosting.co.za>
In-reply-to

Content
Attached patch works around the CODESET issue on OpenIndiana and FreeBSD. If the LC_CTYPE locale is "C" and nl_langinfo(CODESET) returns ASCII (or an alias of this encoding), b"\xE9" is decoded from the locale encoding: if the result is U+00E9, the patch Python uses ISO-8859-1. (If decoding fails, the locale encoding is really ASCII, the workaround is not used.) If the result is different (b'\xe9' is not decoded from the locale encoding to U+00E9), a ValueError is raised. I wrote this test to detect bugs. I hope that our buildbots will validate the code. We may choose a different behaviour (ex: keep ASCII). Example on FreeBSD 8.2, original Python 3.4: $ ./python >>> import sys, locale >>> sys.getfilesystemencoding() 'ascii' >>> locale.getpreferredencoding() 'US-ASCII' Example on FreeBSD 8.2, patched Python 3.4: $ ./python >>> import sys, locale >>> sys.getfilesystemencoding() 'iso8859-1' >>> locale.getpreferredencoding() 'iso8859-1'

Attached patch works around the CODESET issue on OpenIndiana and FreeBSD. If the LC_CTYPE locale is "C" and nl_langinfo(CODESET) returns ASCII (or an alias of this encoding), b"\xE9" is decoded from the locale encoding: if the result is U+00E9, the patch Python uses ISO-8859-1. (If decoding fails, the locale encoding is really ASCII, the workaround is not used.)

If the result is different (b'\xe9' is not decoded from the locale encoding to U+00E9), a ValueError is raised. I wrote this test to detect bugs. I hope that our buildbots will validate the code. We may choose a different behaviour (ex: keep ASCII).

Example on FreeBSD 8.2, original Python 3.4:

$ ./python
>>> import sys, locale
>>> sys.getfilesystemencoding()
'ascii'
>>> locale.getpreferredencoding()
'US-ASCII'

Example on FreeBSD 8.2, patched Python 3.4:

$ ./python 
>>> import sys, locale
>>> sys.getfilesystemencoding()
'iso8859-1'
>>> locale.getpreferredencoding()
'iso8859-1'

History
Date	User	Action	Args
2012-11-11 23:34:24	vstinner	set	recipients: + vstinner, ezio.melotti
2012-11-11 23:34:24	vstinner	set	messageid: <1352676864.76.0.63024120757.issue16455@psf.upfronthosting.co.za>
2012-11-11 23:34:24	vstinner	link	issue16455 messages
2012-11-11 23:34:24	vstinner	create