This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vstinner
Recipients BreamoreBoy, anthonybaxter, brett.cannon, ezio.melotti, kristjan.jonsson, loewis, nnorwitz, theller, vstinner
Date 2010-09-01.19:33:11
SpamBayes Score 4.48439e-08
Marked as misclassified No
Message-id <201009012133.02987.victor.stinner@haypocalc.com>
In-reply-to <1283304195.04.0.0745873924129.issue1552880@psf.upfronthosting.co.za>
Content
> According to the Unicode standard the high and low surrogate halves used
> by UTF-16 (...)

Yes, but in Python, U+DC80..D+DCFF range is used to store undecodable bytes. 
Eg. 'abc\xff'.decode('ascii', 'surrogateescape') gives 'abc\udcff'.

> Anyway, as you remark, my approach is a _patch_, designed to make python
> (2.x) work in an unicode environment, with the least amount of code
> change, for those willing to commit such a patch.

Python 2.7 is out and I think it is too late to fix Python2. Anyway, Python2 
uses bytes for sys.path or other paths, so the problem only occurs if the user 
specifies unicode paths.

> In 3.x you may want to do things differently.

I choosed to rewrite the C code to manipulate unicode paths instead of byte 
paths => #9425
History
Date User Action Args
2010-09-01 19:33:13vstinnersetrecipients: + vstinner, loewis, nnorwitz, brett.cannon, anthonybaxter, theller, kristjan.jonsson, ezio.melotti, BreamoreBoy
2010-09-01 19:33:12vstinnerlinkissue1552880 messages
2010-09-01 19:33:11vstinnercreate