This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vstinner
Recipients eryksun, ezio.melotti, ionelmc, vstinner
Date 2014-06-19.13:15:07
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1403183708.05.0.572167060507.issue21808@psf.upfronthosting.co.za>
In-reply-to
Content
The support of the code page 65001 (CP_UTF8, "cp65001") was added in Python 3.3. It is usually used for the OEM code page. The chcp command changes the Windows console encoding which is used by sys.{stdin,stdout,stderr).encoding. locale.getpreferredencoding() is the ANSI code page.

Read also:
http://unicodebook.readthedocs.org/operating_systems.html#code-pages
http://unicodebook.readthedocs.org/programming_languages.html#windows

> cp65001 is purported to be an alias for utf8.

No, cp65001 is not an alias of utf8: it handles surrogate characters differently. The behaviour of CP_UTF8 depends on the flags and the Windows version.

If you really want to use the UTF-8 codec: force the stdio encoding using PYTHONIOENCODING envrionment variable:
https://docs.python.org/dev/using/cmdline.html#envvar-PYTHONIOENCODING

Setting the Windows console encoding to cp65001 using the chcp command doesn't make the Windows console fully Unicode compliant. It is a little bit better using TTF fonts, but it's not enough. See the old issue #1602 opened 7 years ago and not fixed yet.

Backporting the cp65001 codec requires too many changes in the codec code. I made these changes between Python 3.1 and 3.3, I don't want to redo them in Python 2.7 because it may break backward compatibility. For example, in Python 3.3, the "strict" mode really means "strict", whereas in Python 2.7, code page codecs use the default flags which is not strict. See:
http://unicodebook.readthedocs.org/operating_systems.html#encode-and-decode-functions

So I'm in favor of closing the issue as "wont fix". The fix is to upgrade to Python 3!
History
Date User Action Args
2014-06-19 13:15:08vstinnersetrecipients: + vstinner, ezio.melotti, ionelmc, eryksun
2014-06-19 13:15:08vstinnersetmessageid: <1403183708.05.0.572167060507.issue21808@psf.upfronthosting.co.za>
2014-06-19 13:15:08vstinnerlinkissue21808 messages
2014-06-19 13:15:07vstinnercreate