This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author ncoghlan
Recipients Sworddragon, a.badger, bkabrda, larry, lemburg, loewis, ncoghlan, pitrou, r.david.murray, serhiy.storchaka, terry.reedy, vstinner
Date 2013-12-09.02:56:41
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <CADiSq7dyf0F9wkM9LgNkACWfP23KVaa6zz2o0EXgBH9VW2rVmA@mail.gmail.com>
In-reply-to <CAMpsgwbwqwqsFNLKk4ym4pzc7+HWwXjv_NpsFKtc-mFgCXq8JQ@mail.gmail.com>
Content
On 9 December 2013 12:08, STINNER Victor <report@bugs.python.org> wrote:
>
> STINNER Victor added the comment:
>
>> End users tripping over this by setting LANG=C is one of the pain points of Python 3 relative to Python 2 for Fedora, so I've added a couple of Fedora folks to the nosy list.
>
> Sorry, I'm not aware of such issue. Do you have examples?

Armin's travails with remote shell access and Python 3 are just as
likely today as they were a couple of years ago:
http://lucumr.pocoo.org/2011/12/7/thoughts-on-python3/

(although technically that was a terminal ending up with the POSIX
locale, rather than specifically LANG=C)

>> - the main problem is on Linux (but potentially other *nix systems as well), where people set "LANG=C" for a variety of reasons, but this has the side effect of Python 3 choosing an inappropriate encoding (ASCII rather than UTF-8) when talking to the OS APIs.
>
> Why do you think that the issue is specific to Python 3? Try to open a
> terminal with LC_ALL=C and try to type non-ASCII characters with your
> keyboard. You can't because your terminal uses ASCII. Did you
> applications written in another language handling Unicode, like Perl?
> (Perl with Unicode support correctly enabled, it's "use utf8;" if I
> remember correctly).

It's the fact this used to work transparently in Python 2 (since all
these interfaces were just bytes based on the Python side as well)
that's a problem. That makes the new sensitivity to the locale
encoding a usability regression, and that's a concern for distros that
are considering switching their default Python version.

> Can you explain the "various reasons" why users explictly force the
> encoding to ASCII?

- testing applications for POSIX compliance
- default settings on servers where you don't control the environment
- because they never previously had to care, and it's only Python 3
deciding to pay attention to it which makes it relevent for them

> I use LANG=C to get manual pages and error messages in english. But
> "LANG=en_US man ls" would be more correct, or "LC_MESSAGES=en_US man
> ls" to be pedantic. (Env var priority: LC_ALL > LANG > LC_xxx).
>
> IMO if you use LANG=C, you must not complain that Unicode stopped
> working, but you should learn how to configure locales. Trivial
> examples like the one which can be found in the initial message
> (msg204849) are wrong: why would you force all locales to C and use
> non-ASCII characters?

And yet, in Python 2, people could do that, and Python didn't care.
*That's* the regression I'm worried about. If it hadn't round-tripped
cleanly in Python 2, I wouldn't care here either.

>> Given the initialisation problems, this may be something that PEP 432 (the initialisation process rewrite) can help with (since it changes the initialisation order to create a more complete Python runtime before it starts to configure the OS interfaces).
>
> I don't see how it would help to solve my point (b).

Having a Python runtime available makes things that are currently
tediously painful to deal with during startup easier to tweak. I'm not
sure it *will* help in this particular case, but it's now one I'm
going to keep an eye on.

> Technically, this issue cannot be fixed. Or to be more specific, I
> don't want to fix it, it's a waste of time. So I don't understand what
> do you expect from this open issue?

A way to get Python 3 to cope as well with a misconfigured OS
environment as Python 2 did.

> I would prefer to close it as invalid or wontfix to be clear.

It's a usability regression from Python 2, so I don't want to give up
on it. It may be that we just implement a "ignore what the OS claims,
it's misconfigured, just use UTF-8 for everything" flag. But OS
configuration errors shouldn't cripple the Python runtime.
History
Date User Action Args
2013-12-09 02:56:42ncoghlansetrecipients: + ncoghlan, lemburg, loewis, terry.reedy, pitrou, vstinner, larry, a.badger, r.david.murray, Sworddragon, serhiy.storchaka, bkabrda
2013-12-09 02:56:42ncoghlanlinkissue19846 messages
2013-12-09 02:56:41ncoghlancreate