This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vstinner
Recipients amaury.forgeotdarc, eric.araujo, ezio.melotti, jkloth, larry, loewis, r.david.murray, serhiy.storchaka, techtonik, vstinner
Date 2012-12-14.21:21:47
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1355520107.98.0.638649160291.issue16656@psf.upfronthosting.co.za>
In-reply-to
Content
On Windows with Python 2, unencodable characters are replaced with "?". It is the default behaviour of WideCharToMultiByte() and so all ANSI functions have this behaviour. Python doesn't try to behave differently, it just exposes system function as Python functions.

So for example, os.listdir(bytes) returns filename with "?" if some characters are not encodable to the ANSI codepage. It's a choice in the design of Windows.

> This critical bug is one of the reasons that non-English speaking
> communities doesn't adopt Python as broadly as it happens in
> English world compared to other technologies (PHP etc.).

I don't understand this point.

PHP doesn't have a Unicode type, I'm quite sure that PHP have exactly the same issue. And this issue is only solved in Python 3... except if you explicitly uses a bytes filename (for os.listdir/os.walk), but the bytes filename API has been deprecated in Python 3.3.

In Python 2, you can use Unicode filenames to workaround this issue. But it doesn't work as well as Python 3: on UNIX, you will get a similar issue with undecodable filenames (which is the opposite of unencodable filenames).

Read my book for more information: https://github.com/haypo/unicode_book/wiki

--

About listdir_unicode-2.7.patch: Python chose to work as Windows with unencodable characters. If you want to change the behaviour, you must change *all* calls to the Windows ANSI API (which is not trivial). Anyway, as I wrote, the bytes API is deprecated for filenames in Python 3.3. I prefer to not change anything in Python 2, because it may break existing applications. For example, os.listdir(bytes) doesn't fail in Python 2.7 with unencodable names, whereas it fails with your patch.

Nothing interesting in this issue, I'm closing it. If your consider the redirection issue important, please open a new issue.
History
Date User Action Args
2012-12-14 21:21:48vstinnersetrecipients: + vstinner, loewis, amaury.forgeotdarc, larry, techtonik, jkloth, ezio.melotti, eric.araujo, r.david.murray, serhiy.storchaka
2012-12-14 21:21:47vstinnersetmessageid: <1355520107.98.0.638649160291.issue16656@psf.upfronthosting.co.za>
2012-12-14 21:21:47vstinnerlinkissue16656 messages
2012-12-14 21:21:47vstinnercreate