Author poolie
Recipients benjamin.peterson, gz, pitrou, poolie, r.david.murray, vstinner
Date 2011-12-21.01:36:00
SpamBayes Score 4.44089e-16
Marked as misclassified No
Message-id <>
In-reply-to <>
On 21 December 2011 12:16, Antoine Pitrou <> wrote:
> Antoine Pitrou <> added the comment:
> So, you're complaining about something which works, kind of:
> $ touch héhé
> $ LANG=C python3 -c "import os; print(os.listdir())"
> ['h\udcc3\udca9h\udcc3\udca9']

It's possible to work around this in some cases, such as listdir, by
coping with the result including some byte strings, and then manually
decoding them.  But there are, iirc, other cases where the call just
fails and there is no easy workaround.

It wasn't impossible to get unicode right in python2, but python3
still thinks it's worth changing things to make it work better.

>> This makes robustly working with non-ascii filenames on different
>> platforms needlessly annoying, given no modern nix should have problems
>> just using UTF-8 in these cases.
> So why don't these supposedly "modern" systems at least set the appropriate environment variables for Python to infer the proper character encoding?
> (since these "modern" systems don't have a well-defined encoding...)

The standard encoding is UTF-8.  Python shouldn't need to have a
variable set to tell it this.  Python is making an assumption about
the default but it is a bad assumption.

> The culprit is not Python, it's the Unix crap....

Programs need to work with the environments that are available to
them, even though those environments often have flaws.  Windows and
Mac have annoying bugs too, even bugs specifically about Unicode.
Date User Action Args
2011-12-21 01:36:02pooliesetrecipients: + poolie, pitrou, vstinner, benjamin.peterson, r.david.murray, gz
2011-12-21 01:36:01poolielinkissue13643 messages
2011-12-21 01:36:00pooliecreate