Message 159130 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	serhiy.storchaka
Recipients	belopolsky, ezio.melotti, georg.brandl, lemburg, moese, phr, serhiy.storchaka, tchrist, vstinner
Date	2012-04-24.11:01:36
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1335265297.03.0.881571661115.issue2857@psf.upfronthosting.co.za>
In-reply-to

Content
As far as I understand, this codec can be implemented in Python. There is no need to modify the interpreter core. def decode_cesu8(b): return re.sub('[\uD800-\uDBFF][\uDC00\DFFF]', lambda m: chr(0x10000 \| ((ord(m.group()[0]) & 0x3FF) << 10) \| (ord(m.group()[1]) & 0x3FF)), b.decode('utf-8', 'surrogatepass')) def encode_cesu8(s): return re.sub('[\U00010000-\U0010FFFF]', lambda m: chr(0xD800 \| ((ord(m.group()) >> 10) & 0x3FF)) + chr(0xDC00 \| (ord(m.group() & 0x3FF)), s).encode('utf-8', 'surrogatepass') def decode_mutf8(b): return decode_cesu8(b.replace(b'\xC0\x80', b'\x00')) def encode_mutf8(s): return encode_cesu8(s).replace(b'\x00', b'\xC0\x80')

As far as I understand, this codec can be implemented in Python. There is no need to modify the interpreter core.

def decode_cesu8(b):
    return re.sub('[\uD800-\uDBFF][\uDC00\DFFF]', lambda m: chr(0x10000 | ((ord(m.group()[0]) & 0x3FF) << 10) | (ord(m.group()[1]) & 0x3FF)), b.decode('utf-8', 'surrogatepass'))

def encode_cesu8(s):
    return re.sub('[\U00010000-\U0010FFFF]', lambda m: chr(0xD800 | ((ord(m.group()) >> 10) & 0x3FF)) + chr(0xDC00 | (ord(m.group() & 0x3FF)), s).encode('utf-8', 'surrogatepass')

def decode_mutf8(b):
    return decode_cesu8(b.replace(b'\xC0\x80', b'\x00'))

def encode_mutf8(s):
    return encode_cesu8(s).replace(b'\x00', b'\xC0\x80')

History
Date	User	Action	Args
2012-04-24 11:01:37	serhiy.storchaka	set	recipients: + serhiy.storchaka, lemburg, georg.brandl, phr, belopolsky, moese, vstinner, ezio.melotti, tchrist
2012-04-24 11:01:37	serhiy.storchaka	set	messageid: <1335265297.03.0.881571661115.issue2857@psf.upfronthosting.co.za>
2012-04-24 11:01:36	serhiy.storchaka	link	issue2857 messages
2012-04-24 11:01:36	serhiy.storchaka	create