Message 143138 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	moese
Recipients	ezio.melotti, lemburg, moese
Date	2011-08-29.11:50:11
SpamBayes Score	7.273175e-06
Marked as misclassified	No
Message-id	<1314618612.31.0.507409207002.issue12742@psf.upfronthosting.co.za>
In-reply-to

Content
It's an internal web API at the place I work for. To be able to use it from Python in some form, I did an workaround in which I just stripped everything outside BMP: # replace characters outside BMP with 'REPLACEMENT CHARACTER' (U+FFFD) def cesu8_to_utf8(text): ....result = "" ....index = 0 ....length = len(text) ....while index < length: ........if text[index] < "\xf0": ............result += text[index] ............index += 1 ........else: ............result += "\xef\xbf\xbd" # u"\ufffd".encode("utf8") ............index += 4 ....return result Now that I look at the workaround again, I'm not even sure it's about CESU-8 (it strips Unicode chars encoded to 4 bytes, not 2 pairs of 3 bytes surrogates). However I can see why there would be little interest in adding this encoding.

It's an internal web API at the place I work for.

To be able to use it from Python in some form, I did an workaround in which I just stripped everything outside BMP:

# replace characters outside BMP with 'REPLACEMENT CHARACTER' (U+FFFD)
def cesu8_to_utf8(text):
....result = ""
....index = 0
....length = len(text)
....while index < length:
........if text[index] < "\xf0":
............result += text[index]
............index  += 1
........else:
............result += "\xef\xbf\xbd"  # u"\ufffd".encode("utf8")
............index  += 4
....return result

Now that I look at the workaround again, I'm not even sure it's about CESU-8 (it strips Unicode chars encoded to 4 bytes, not 2 pairs of 3 bytes surrogates).

However I can see why there would be little interest in adding this encoding.

History
Date	User	Action	Args
2011-08-29 11:50:12	moese	set	recipients: + moese, lemburg, ezio.melotti
2011-08-29 11:50:12	moese	set	messageid: <1314618612.31.0.507409207002.issue12742@psf.upfronthosting.co.za>
2011-08-29 11:50:11	moese	link	issue12742 messages
2011-08-29 11:50:11	moese	create