This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vstinner
Recipients Jan Niklas Hasse, Sworddragon, abarry, akira, barry, ezio.melotti, lemburg, methane, ncoghlan, r.david.murray, vstinner, yan12125
Date 2017-01-04.16:06:08
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <>
In-reply-to <>
> The default encoding in the C/POSIX locale is ASCII (which is the entire source of the problem).

The reality is more complex than that :-) It depends on the OS.

Some OS uses Latin1 for the POSIX locale. Some OS announces to use
Latin1 for the POSIX locale, but use ASCII in practice :-) On these
lying OS, Python decodes bytes 0x80..0xff using mbstowcs() to check if
we get ASCII or Latin1: see the check_force_ascii() function.

/* Workaround FreeBSD and OpenIndiana locale encoding issue with the C locale.
   On these operating systems, nl_langinfo(CODESET) announces an alias of the
   ASCII encoding, whereas mbstowcs() and wcstombs() functions use the
   ISO-8859-1 encoding. The problem is that os.fsencode() and os.fsdecode() use
   locale.getpreferredencoding() codec. For example, if command line arguments
   are decoded by mbstowcs() and encoded back by os.fsencode(), we get a
   UnicodeEncodeError instead of retrieving the original byte string.

   The workaround is enabled if setlocale(LC_CTYPE, NULL) returns "C",
   nl_langinfo(CODESET) announces "ascii" (or an alias to ASCII), and at least
   one byte in range 0x80-0xff can be decoded from the locale encoding. The
   workaround is also enabled on error, for example if getting the locale

    (...) */
Date User Action Args
2017-01-04 16:06:09vstinnersetrecipients: + vstinner, lemburg, barry, ncoghlan, ezio.melotti, r.david.murray, methane, akira, Sworddragon, yan12125, abarry, Jan Niklas Hasse
2017-01-04 16:06:09vstinnerlinkissue28180 messages
2017-01-04 16:06:08vstinnercreate