This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author kristjan.jonsson
Recipients BreamoreBoy, anthonybaxter, brett.cannon, eric.araujo, ezio.melotti, kristjan.jonsson, loewis, nnorwitz, theller, vstinner
Date 2010-09-02.01:37:10
SpamBayes Score 1.8490653e-11
Marked as misclassified No
Message-id <1283391433.4.0.153957259426.issue1552880@psf.upfronthosting.co.za>
In-reply-to
Content
> Yes, but in Python, U+DC80..D+DCFF range is used to store undecodable bytes. 
> Eg. 'abc\xff'.decode('ascii', 'surrogateescape') gives 'abc\udcff'.

That's an inventive way of breaking the unicode standard :)
Anyway, why would you worry about that?  My patch doesn't use "surrogateescape" so there is no problem.  There are only two places where I "decode":  
1) module names and sys.path components in the system file encoding:  If they contain undecodable characters, then that is an error.  No reason to propagate that error into the import machinery.
2) when decoding utf-8 back into unicode, but that utf-8 is already leagal since _we_ generated it.

If a _unicode_ input (sys.path) contains a valid surrogate pair, then the utf-8 encoder just encodes it.
But if it finds a lone surrogate as you describe (python special) then that represends an undecodable chacater, something that should have been covered earlier and something we know nothing about.  Clearly, that makes that particular unicode sys.path component invalid.

(Hm, I notice that 2.7 happily encodes lone surrogates to utf-8)

> Python 2.7 is out and I think it is too late to fix Python2. Anyway, Python2 
> uses bytes for sys.path or other paths, so the problem only occurs if the user 
> specifies unicode paths.
Which is precisely the case that it is designed to solve.  When the chinese user installs EVE Online in a weird folder, then that should work.
Also, 2.x is not quite dead yet.  There are quite a few people doing their own patches for their private purposes.  Although my patch won't go into any official version, there might be others in the same situation like us:  Trying to support an _embedded_ python 2.x version in an internationalized enverionment (on windows :)
History
Date User Action Args
2010-09-02 01:37:13kristjan.jonssonsetrecipients: + kristjan.jonsson, loewis, nnorwitz, brett.cannon, anthonybaxter, theller, vstinner, ezio.melotti, eric.araujo, BreamoreBoy
2010-09-02 01:37:13kristjan.jonssonsetmessageid: <1283391433.4.0.153957259426.issue1552880@psf.upfronthosting.co.za>
2010-09-02 01:37:12kristjan.jonssonlinkissue1552880 messages
2010-09-02 01:37:10kristjan.jonssoncreate