Issue 7090: encoding uncode objects greater than FFFF

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/51339

classification

Title:	encoding uncode objects greater than FFFF
Type:	behavior	Stage:	resolved
Components:	Unicode	Versions:	Python 3.0, Python 3.1, Python 2.7, Python 2.6

process

Created on 2009-10-09 09:12 by msaghaei, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (2)
msg93780 - (view)	Author: Mahmoud (msaghaei)	Date: 2009-10-09 09:12
Odd behaviour with str.encode or codecs.Codec.encode or simailar functions, when dealing with uncode objects above ffff with 2.6 >>> u'\u10380'.encode('utf') '\xe1\x80\xb80' with 3.x '\u10380'.encode('utf') '\xe1\x80\xb80' correct output must be: \xf0\x90\x8e\x80
msg93781 - (view)	Author: Ezio Melotti (ezio.melotti) *	Date: 2009-10-09 09:16
If you want to specify codepoints greater than U+FFFF you have to use u'\Uxxxxxxxx': >>> x = u'\u10380' >>> x.encode('utf-8') '\xe1\x80\xb80' >>> x[0] u'\u1038' >>> x[1] u'0' >>> y = u'\U00010380' >>> y.encode('utf-8') '\xf0\x90\x8e\x80'

History
Date	User	Action	Args
2022-04-11 14:56:53	admin	set	github: 51339
2009-10-09 09:16:49	ezio.melotti	set	status: open -> closed nosy: + ezio.melotti messages: + msg93781 resolution: not a bug stage: resolved
2009-10-09 09:12:33	msaghaei	create