This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vstinner
Recipients Arfrever, ezio.melotti, ncoghlan, pitrou, r.david.murray, serhiy.storchaka, vstinner
Date 2014-08-25.01:16:13
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1408929374.02.0.294225577905.issue18814@psf.upfronthosting.co.za>
In-reply-to
Content
Your clean() function looses information. If a filename contains almost only undecodable characters, it will looks like ����.txt. It's not very useful. I would prefer to escape the byte. Mac OS X (HFS+ filesystem) uses for example %HH format: "\udc80" would be replaced with "%80" for example.

This format is also used in URLs. For example, "a\xe9b.txt" (latin-1, whereas my locale encoding is UTF-8) is displayed "a�b.txt" in Firefox (when listing a local directory), but Firefox uses the URL "file://.../a%E9b.txt" (hexadecimal in uppercase).

In the Gnome file browser (Nautilus), "a\xe9b.txt" (latin-1, whereas my locale encoding is UTF-8) is displayed "a�b.txt (invalid encoding)".
History
Date User Action Args
2014-08-25 01:16:14vstinnersetrecipients: + vstinner, ncoghlan, pitrou, ezio.melotti, Arfrever, r.david.murray, serhiy.storchaka
2014-08-25 01:16:14vstinnersetmessageid: <1408929374.02.0.294225577905.issue18814@psf.upfronthosting.co.za>
2014-08-25 01:16:14vstinnerlinkissue18814 messages
2014-08-25 01:16:13vstinnercreate