Message 115329 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	vstinner
Recipients	BreamoreBoy, anthonybaxter, brett.cannon, ezio.melotti, kristjan.jonsson, loewis, nnorwitz, theller, vstinner
Date	2010-09-01.19:33:11
SpamBayes Score	4.48439e-08
Marked as misclassified	No
Message-id	<201009012133.02987.victor.stinner@haypocalc.com>
In-reply-to	<1283304195.04.0.0745873924129.issue1552880@psf.upfronthosting.co.za>

Content
> According to the Unicode standard the high and low surrogate halves used > by UTF-16 (...) Yes, but in Python, U+DC80..D+DCFF range is used to store undecodable bytes. Eg. 'abc\xff'.decode('ascii', 'surrogateescape') gives 'abc\udcff'. > Anyway, as you remark, my approach is a _patch_, designed to make python > (2.x) work in an unicode environment, with the least amount of code > change, for those willing to commit such a patch. Python 2.7 is out and I think it is too late to fix Python2. Anyway, Python2 uses bytes for sys.path or other paths, so the problem only occurs if the user specifies unicode paths. > In 3.x you may want to do things differently. I choosed to rewrite the C code to manipulate unicode paths instead of byte paths => #9425

> According to the Unicode standard the high and low surrogate halves used
> by UTF-16 (...)

Yes, but in Python, U+DC80..D+DCFF range is used to store undecodable bytes. 
Eg. 'abc\xff'.decode('ascii', 'surrogateescape') gives 'abc\udcff'.

> Anyway, as you remark, my approach is a _patch_, designed to make python
> (2.x) work in an unicode environment, with the least amount of code
> change, for those willing to commit such a patch.

Python 2.7 is out and I think it is too late to fix Python2. Anyway, Python2 
uses bytes for sys.path or other paths, so the problem only occurs if the user 
specifies unicode paths.

> In 3.x you may want to do things differently.

I choosed to rewrite the C code to manipulate unicode paths instead of byte 
paths => #9425

History
Date	User	Action	Args
2010-09-01 19:33:13	vstinner	set	recipients: + vstinner, loewis, nnorwitz, brett.cannon, anthonybaxter, theller, kristjan.jonsson, ezio.melotti, BreamoreBoy
2010-09-01 19:33:12	vstinner	link	issue1552880 messages
2010-09-01 19:33:11	vstinner	create