msg174897 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2012-11-05 12:12 |
Attached patch adds support.NONASCII to have a "portable" non-ASCII character that can be used to test non-ASCII strings. The patch uses it in some existing functions.
I wrote the patch on the default branch, we may start to use it since Python 3.2.
|
msg174900 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2012-11-05 12:26 |
I think you should ensure that os.fsdecode(os.fsencode(character)) == character.
|
msg174904 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2012-11-05 12:58 |
If NONASCII is None I suggest the followed fallback code
for i in range(0x80, 0xFFFF):
character = chr(i)
if character.isprintable():
try:
if os.fsdecode(os.fsencode(character)) == character:
NONASCII = character
break
except UnicodeError:
pass
|
msg174922 - (view) |
Author: Chris Jerdonek (chris.jerdonek) * |
Date: 2012-11-05 17:23 |
+# NONASCII: non-ASCII character encodable by os.fsencode(),
+# or None if there is no such character.
+NONASCII = None
Can you use a name that reflects that this is a specific type of non-ASCII character having a special property (e.g. FS_NONASCII)? I think "ASCII" should be reserved for a generic non-ASCII character. Moreover, there may be other types of non-ASCII we can add in the future.
|
msg174946 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2012-11-05 23:10 |
> I think you should ensure that os.fsdecode(os.fsencode(character)) == character.
Chosen characters respect this property, but it doesn't hurt to add such check.
> Can you use a name that reflects that this is a specific type
> of non-ASCII character having a special property (e.g. FS_NONASCII)?
Done.
|
msg174948 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2012-11-05 23:12 |
> If NONASCII is None I suggest the followed fallback code
I prefer to not "brute force" Unicode because it would slow down any test, even tests not using FS_NONASCII. I wrote attached brute.py script to compute an exhaustive list of non-ASCII characters encodable to "any" locale encoding. My locale encoding list is not complete, but it should be enough for our purpose. The list can be completed later.
|
msg174949 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2012-11-05 23:17 |
I tested support_non_ascii-2.patch on Windows with cp932 ANSI code page (FS encoding), and on Linux with ASCII, ISO-8859-1, ISO-8859-15 and UTF-8 locale encodings.
|
msg174959 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2012-11-06 09:48 |
I tested brute.py for all supported in Python encodings:
No character for encoding cp1006:surrogateescape :-(
No character for encoding cp720:surrogateescape :-(
No character for encoding cp864:surrogateescape :-(
No character for encoding iso8859_3:surrogateescape :-(
No character for encoding iso8859_6:surrogateescape :-(
No character for encoding mac_arabic:surrogateescape :-(
No character for encoding mac_farsi:surrogateescape :-(
|
msg174961 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2012-11-06 10:20 |
> I tested brute.py for all supported in Python encodings:
Oh thanks, interesting result. I completed the encoding list and the character list: see brute2.py. I added "joker" characters: U+00A0 and U+20AC which match requierements for most locale encodings.
|
msg175016 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2012-11-06 22:23 |
New changeset de8cf1ece068 by Victor Stinner in branch 'default':
Issue #16414: Add support.FS_NONASCII and support.TESTFN_NONASCII
http://hg.python.org/cpython/rev/de8cf1ece068
|
msg175017 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2012-11-06 22:33 |
New changeset 0e9fbdda3c92 by Victor Stinner in branch 'default':
Issue #16414: Fix support.TESTFN_UNDECODABLE and test_genericpath.test_nonascii_abspath()
http://hg.python.org/cpython/rev/0e9fbdda3c92
|
msg175018 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2012-11-06 22:34 |
Why were you add '- ' suffix to TESTFN_NONASCII?
|
msg175019 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2012-11-06 22:39 |
I don't see U+00A0 and U+20AC in the changeset.
|
msg175020 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2012-11-06 22:40 |
New changeset 55710b8c6670 by Victor Stinner in branch 'default':
Issue #16414: Fix typo in support.TESTFN_NONASCII (useless space)
http://hg.python.org/cpython/rev/55710b8c6670
|
msg175021 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2012-11-06 22:43 |
New changeset 7f90305d9f23 by Victor Stinner in branch 'default':
Issue #16414: Test more characters for support.FS_NONASCII
http://hg.python.org/cpython/rev/7f90305d9f23
|
msg175025 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2012-11-06 23:10 |
New changeset fce9e892c65d by Victor Stinner in branch 'default':
Issue #16414: Fix test_os on Windows, don't test os.listdir() with undecodable
http://hg.python.org/cpython/rev/fce9e892c65d
|
msg175026 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2012-11-06 23:12 |
> Why were you add '- ' suffix to TESTFN_NONASCII?
Oops, the space was a mistake. I add "-" just for the readability of the generated filename.
> I don't see U+00A0 and U+20AC in the changeset.
Oh, I forgot to update the patch with the latest results of "brute2.py". It is now fixed.
Thanks for the review!
|
msg175033 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2012-11-06 23:40 |
Handling non-ASCII paths is always a pain. I don't plan to backport support.FS_NONASCII to Python 3.3 right now, but I may backport it later.
|
msg178870 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2013-01-03 00:59 |
New changeset 41658a4fb3cc by Victor Stinner in branch '3.2':
Issue #16218, #16414, #16444: Backport FS_NONASCII, TESTFN_UNDECODABLE,
http://hg.python.org/cpython/rev/41658a4fb3cc
New changeset 4d40c1ce8566 by Victor Stinner in branch '3.3':
(Merge 3.2) Issue #16218, #16414, #16444: Backport FS_NONASCII,
http://hg.python.org/cpython/rev/4d40c1ce8566
|
|
Date |
User |
Action |
Args |
2022-04-11 14:57:38 | admin | set | github: 60618 |
2013-01-03 01:07:33 | vstinner | set | versions:
+ Python 3.2, Python 3.3 |
2013-01-03 00:59:48 | python-dev | set | messages:
+ msg178870 |
2012-11-06 23:40:40 | vstinner | set | status: open -> closed resolution: fixed messages:
+ msg175033
versions:
- Python 3.3 |
2012-11-06 23:12:22 | vstinner | set | messages:
+ msg175026 |
2012-11-06 23:10:13 | python-dev | set | messages:
+ msg175025 |
2012-11-06 22:43:05 | python-dev | set | messages:
+ msg175021 |
2012-11-06 22:41:18 | vstinner | set | files:
+ brute2.py |
2012-11-06 22:40:15 | python-dev | set | messages:
+ msg175020 |
2012-11-06 22:39:58 | serhiy.storchaka | set | messages:
+ msg175019 |
2012-11-06 22:34:10 | serhiy.storchaka | set | messages:
+ msg175018 |
2012-11-06 22:33:32 | python-dev | set | messages:
+ msg175017 |
2012-11-06 22:23:28 | python-dev | set | nosy:
+ python-dev messages:
+ msg175016
|
2012-11-06 10:20:30 | vstinner | set | messages:
+ msg174961 |
2012-11-06 09:48:23 | serhiy.storchaka | set | messages:
+ msg174959 |
2012-11-05 23:17:33 | vstinner | set | messages:
+ msg174949 |
2012-11-05 23:12:37 | vstinner | set | files:
- support_non_ascii.patch |
2012-11-05 23:12:29 | vstinner | set | files:
+ brute.py
messages:
+ msg174948 |
2012-11-05 23:10:47 | vstinner | set | files:
+ support_non_ascii-2.patch
messages:
+ msg174946 |
2012-11-05 17:23:31 | chris.jerdonek | set | nosy:
+ chris.jerdonek messages:
+ msg174922
|
2012-11-05 12:58:22 | serhiy.storchaka | set | messages:
+ msg174904 |
2012-11-05 12:26:41 | serhiy.storchaka | set | messages:
+ msg174900 |
2012-11-05 12:12:14 | vstinner | create | |