This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vstinner
Recipients ocean-city, vstinner
Date 2010-09-10.10:27:52
SpamBayes Score 1.3988108e-06
Marked as misclassified No
Message-id <1284114476.73.0.517566073397.issue9819@psf.upfronthosting.co.za>
In-reply-to
Content
> WARNING: The filename '@test_464_tmp-共有される' CAN be encoded 
> by (...) cp932

We should find character not encodable in any Windows code page, but accepted as filenames.

> characters like "\u2661" or "\u2668" (...)

mbcs uses "ANSI" code pages: cp1250..cp1258 and cp874 (and maybe others because you wrote that your setup uses cp932):
http://en.wikipedia.org/wiki/Code_page#Windows_.28ANSI.29_code_pages

I wrote a short script to find a unencodable filename (attached to this issue). Output:

u'\u0301' is encodable to cp1258
u'\u0363' is not encodable to any code page
u'\u2661' is encodable to cp949
u'\u5171' is encodable to cp932, cp936, cp949, cp950

(CODE_PAGES constant of the script might be incomplete)

u'\u2661' is not a good candidate. u'\u0363' looks better. Be we can mix different characters to limit the probability that the whole string is encodable. Example:

u'\u2661\u5171' is encodable to cp949
u'\u0301\u0363\u2661\u5171' is not encodable to any code page

> TESTFN_UNICODE_UNDECODEABLE (2.x)

This is a typo fixed by r83987 in py3k.
History
Date User Action Args
2010-09-10 10:27:56vstinnersetrecipients: + vstinner, ocean-city
2010-09-10 10:27:56vstinnersetmessageid: <1284114476.73.0.517566073397.issue9819@psf.upfronthosting.co.za>
2010-09-10 10:27:52vstinnerlinkissue9819 messages
2010-09-10 10:27:52vstinnercreate