Message 137801 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	vstinner
Recipients	brian.curtin, eric.smith, georg.brandl, jaraco, loewis, nadeem.vawda, ocean-city, santoso.wijaya, tim.golden, vstinner
Date	2011-06-07.10:59:54
SpamBayes Score	0.0
Marked as misclassified	No
Message-id	<1307444395.99.0.918088582709.issue12084@psf.upfronthosting.co.za>
In-reply-to

Content
issue12084.diff: - test_os pass with patched Python 3.3 on Windows 7 64 bits (and on Linux, Debian Sid) - in test_os: "finally: os.remove(file1)" fails with file1 doesn't exist: a new try/finally should be used after "with open(file1, "w") as f:" block, or support.unlink() should be used instead - the patch doesn't apply cleany on the default Mercurial branch (fail on posixmodule.c), can you update your patch? - posixmodule.c still refer to win32_lstat() in a comment (a comment just before the removed win32_lstat() function) - win32_stat() must fail if the conversion to wide character failed. win32_stat() should use PyUnicode_DecodeFSDefault() instead of mbstowcs_s(). PyUnicode_DecodeFSDefault() allocates the memory and handles errors correctly: raise a nice Python error on decoding error (it uses the strict error handler for MBCS encoding, the ANSI code page). win32_stat() decodes "manually" the filename from the ANSI code page and use the wide character API, instead of using the ANSI API. It is a little bit different than what is done for other functions of the posix module: other functions use the ANSI API, as specified by the PEP 277. http://www.python.org/dev/peps/pep-0277/ Well, this PEP was written in 2002 when the default string type in Python (2) was the byte string. In 2011, with Python 3, the default string type is a character string and I can easily understand that you prefer to simplify the code by using a single string type. I also remember a discussion about deprecating byte filenames in Python 3. I would prefer to discuss this point (decode byte string from the ANSI code page) on the mailing, and maybe also update the PEP 277. The main question for me is how the ANSI API handles undecodable bytes: does it raise an error or ignore them? For stat(), ignoring undecodable bytes means that stat() will raise a "file not found error". Most ANSI code pages never fail with a decoding error because they are 8 bits encoding and all bytes are mapped to characters. They are some multibyte code pages (like UTF-8), but I don't think that any Windows use such code page by default. I don't even know if it's possible to use a multibyte code page as the ANSI code page. (I didn't check the symlink algorithm, I only cares about Unicode :-D)

issue12084.diff:
 - test_os pass with patched Python 3.3 on Windows 7 64 bits (and on Linux, Debian Sid)
 - in test_os: "finally: os.remove(file1)"  fails with file1 doesn't exist: a new try/finally should be used after "with open(file1, "w") as f:" block, or support.unlink() should be used instead
 - the patch doesn't apply cleany on the default Mercurial branch (fail on posixmodule.c), can you update your patch?
 - posixmodule.c still refer to win32_lstat() in a comment (a comment just before the removed win32_lstat() function)
 - win32_stat() must fail if the conversion to wide character failed. win32_stat() should use PyUnicode_DecodeFSDefault() instead of mbstowcs_s(). PyUnicode_DecodeFSDefault() allocates the memory and handles errors correctly: raise a nice Python error on decoding error (it uses the strict error handler for MBCS encoding, the ANSI code page).

win32_stat() decodes "manually" the filename from the ANSI code page and use the wide character API, instead of using the ANSI API. It is a little bit different than what is done for other functions of the posix module: other functions use the ANSI API, as specified by the PEP 277.
http://www.python.org/dev/peps/pep-0277/

Well, this PEP was written in 2002 when the default string type in Python (2) was the byte string. In 2011, with Python 3, the default string type is a character string and I can easily understand that you prefer to simplify the code by using a single string type. I also remember a discussion about deprecating byte filenames in Python 3.

I would prefer to discuss this point (decode byte string from the ANSI code page) on the mailing, and maybe also update the PEP 277.

The main question for me is how the ANSI API handles undecodable bytes: does it raise an error or ignore them? For stat(), ignoring undecodable bytes means that stat() will raise a "file not found error".

Most ANSI code pages never fail with a decoding error because they are 8 bits encoding and all bytes are mapped to characters. They are some multibyte code pages (like UTF-8), but I don't think that any Windows use such code page *by default*. I don't even know if it's possible to use a multibyte code page as the ANSI code page.

(I didn't check the symlink algorithm, I only cares about Unicode :-D)

History
Date	User	Action	Args
2011-06-07 10:59:56	vstinner	set	recipients: + vstinner, loewis, georg.brandl, jaraco, ocean-city, eric.smith, tim.golden, nadeem.vawda, brian.curtin, santoso.wijaya
2011-06-07 10:59:55	vstinner	set	messageid: <1307444395.99.0.918088582709.issue12084@psf.upfronthosting.co.za>
2011-06-07 10:59:55	vstinner	link	issue12084 messages
2011-06-07 10:59:54	vstinner	create