Message 115989 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	ocean-city
Recipients	ocean-city
Date	2010-09-10.09:39:49
SpamBayes Score	4.9071858e-14
Marked as misclassified	No
Message-id	<1284111593.27.0.664276526228.issue9819@psf.upfronthosting.co.za>
In-reply-to

Content
Hello. I noticed test suite reports WARNING every time. /////////////////////////////////////////////////// E:\python-dev>py3k -m test.regrtest test_os WARNING: The filename '@test_464_tmp-共有される' CAN be encoded by the filesyste m encoding (mbcs). Unicode filename tests may not be effective (snip) /////////////////////////////////////////////////// This happens because TESTFN_UNICODE_UNDECODABLE in Lib/test/support.py is decodable on Japanese environment (cp932). It is easy to make this really undecodable in Japanese. Using the characters like "\u2661" or "\u2668" (Former is heart mark, latter is "Onsen" - Hot spring mark) I could remove the warning by this. TESTFN_UNENCODABLE = TESTFN + "-\u5171\u6709\u3055\u308c\u308b\u2661\u2668" /////////////////////////////////////////////////// And another issue. This happens only on test_unicode_file, /////////////////////////////////////////////////// E:\python-dev>py3k -m test.test_unicode_file Traceback (most recent call last): File "e:\python-dev\py3k\lib\test\test_unicode_file.py", line 12, in <module> TESTFN_UNICODE.encode(TESTFN_ENCODING) UnicodeEncodeError: 'mbcs' codec can't encode characters in position 0--1: inval id character During handling of the above exception, another exception occurred: Traceback (most recent call last): File "e:\python-dev\py3k\lib\runpy.py", line 160, in _run_module_as_main "__main__", fname, loader, pkg_name) File "e:\python-dev\py3k\lib\runpy.py", line 73, in _run_code exec(code, run_globals) File "e:\python-dev\py3k\lib\test\test_unicode_file.py", line 16, in <module> raise unittest.SkipTest("No Unicode filesystem semantics on this platform.") unittest.case.SkipTest: No Unicode filesystem semantics on this platform. /////////////////////////////////////////////////// This happens because TESTFN_UNICODE cannot be encoded in Japanese. E:\python-dev>py3k Python 3.2a2+ (py3k:84663M, Sep 10 2010, 13:24:41) [MSC v.1400 32 bit (Intel)] o n win32 Type "help", "copyright", "credits" or "license" for more information. >>> print("-\xe0\xf2") Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeEncodeError: 'cp932' codec can't encode character '\xe0' in position 1: i llegal multibyte sequence But interesting, this bytes sequence "\xe0\xf2" can be read as cp932 multibyte characters. E:\python-dev>python Python 2.6.6 (r266:84297, Aug 24 2010, 18:46:32) [MSC v.1500 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> print "\xe0\xf2" 瑣 >>> "\xe0\xf2".decode("cp932") u'\u7463' E:\python-dev>py3k Python 3.2a2+ (py3k:84663M, Sep 10 2010, 13:24:41) [MSC v.1400 32 bit (Intel)] o n win32 Type "help", "copyright", "credits" or "license" for more information. >>> print('\u7463') 瑣 I believe this value "\xe0\xf2" came from python2.x, maybe "\u7463" should be used here? I'm not sure it can be decoded everywhere using other encodings, though.

Hello. I noticed test suite reports WARNING every time.

///////////////////////////////////////////////////

E:\python-dev>py3k -m test.regrtest test_os
WARNING: The filename '@test_464_tmp-共有される' CAN be encoded by the filesyste
m encoding (mbcs). Unicode filename tests may not be effective
(snip)

///////////////////////////////////////////////////

This happens because TESTFN_UNICODE_UNDECODABLE in Lib/test/support.py
*is* decodable on Japanese environment (cp932).

It is easy to make this really undecodable in Japanese.
Using the characters like "\u2661" or "\u2668" (Former is heart mark,
latter is "Onsen" - Hot spring mark) I could remove the warning by this.
    TESTFN_UNENCODABLE = TESTFN + "-\u5171\u6709\u3055\u308c\u308b\u2661\u2668"

///////////////////////////////////////////////////

And another issue. This happens only on test_unicode_file,

///////////////////////////////////////////////////

E:\python-dev>py3k -m test.test_unicode_file
Traceback (most recent call last):
  File "e:\python-dev\py3k\lib\test\test_unicode_file.py", line 12, in <module>
    TESTFN_UNICODE.encode(TESTFN_ENCODING)
UnicodeEncodeError: 'mbcs' codec can't encode characters in position 0--1: inval
id character

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "e:\python-dev\py3k\lib\runpy.py", line 160, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "e:\python-dev\py3k\lib\runpy.py", line 73, in _run_code
    exec(code, run_globals)
  File "e:\python-dev\py3k\lib\test\test_unicode_file.py", line 16, in <module>
    raise unittest.SkipTest("No Unicode filesystem semantics on this platform.")

unittest.case.SkipTest: No Unicode filesystem semantics on this platform.

///////////////////////////////////////////////////

This happens because TESTFN_UNICODE cannot be encoded in Japanese.

E:\python-dev>py3k
Python 3.2a2+ (py3k:84663M, Sep 10 2010, 13:24:41) [MSC v.1400 32 bit (Intel)] o
n win32
Type "help", "copyright", "credits" or "license" for more information.
>>> print("-\xe0\xf2")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'cp932' codec can't encode character '\xe0' in position 1: i
llegal multibyte sequence

But interesting, this bytes sequence "\xe0\xf2" can be read as
cp932 multibyte characters.

E:\python-dev>python
Python 2.6.6 (r266:84297, Aug 24 2010, 18:46:32) [MSC v.1500 32 bit (Intel)] on
win32
Type "help", "copyright", "credits" or "license" for more information.
>>> print "\xe0\xf2"
瑣
>>> "\xe0\xf2".decode("cp932")
u'\u7463'

E:\python-dev>py3k
Python 3.2a2+ (py3k:84663M, Sep 10 2010, 13:24:41) [MSC v.1400 32 bit (Intel)] o
n win32
Type "help", "copyright", "credits" or "license" for more information.
>>> print('\u7463')
瑣

I believe this value "\xe0\xf2" came from python2.x, maybe "\u7463"
should be used here? I'm not sure it can be decoded everywhere using
other encodings, though.

History
Date	User	Action	Args
2010-09-10 09:39:53	ocean-city	set	recipients: + ocean-city
2010-09-10 09:39:53	ocean-city	set	messageid: <1284111593.27.0.664276526228.issue9819@psf.upfronthosting.co.za>
2010-09-10 09:39:51	ocean-city	link	issue9819 messages
2010-09-10 09:39:50	ocean-city	create