This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author lemburg
Recipients
Date 2002-10-10.15:30:02
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to
Content
Logged In: YES 
user_id=38388

I'm not exactly sure why things work again, but I do
know that I looked into this some time ago. Perhaps I
simply forgot to close the bug or one of the UTF-8
codec overhauls remedied the problem.

Here's what I get with python 2.3 UCS4:

>>> len(u'\U000d0000')
1
>>> len(u"\udb00\udc00")
2
>>> u'\U000d0000' == u"\udb00\udc00"
False
>>> len(unicode(u"\udb00\udc00".encode('utf-8'), 'utf-8'))
1
>>> len(unicode(u'\U000d0000'.encode('utf-8'), 'utf-8'))
1

This is what I get with Python 2.2.1:
>>> len(u'\U000d0000')
2
>>> len(u"\udb00\udc00")
2
>>> u'\U000d0000' == u"\udb00\udc00"
1
>>> len(unicode(u"\udb00\udc00".encode('utf-8'), 'utf-8'))
2
>>> len(unicode(u'\U000d0000'.encode('utf-8'), 'utf-8'))
2

There's still a difference there, but the UTF-8 codec behaves
consistently.
History
Date User Action Args
2007-08-23 14:01:15adminlinkissue554916 messages
2007-08-23 14:01:15admincreate