This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author serhiy.storchaka
Recipients belopolsky, ezio.melotti, georg.brandl, lemburg, moese, phr, serhiy.storchaka, tchrist, vstinner
Date 2012-04-24.11:01:36
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1335265297.03.0.881571661115.issue2857@psf.upfronthosting.co.za>
In-reply-to
Content
As far as I understand, this codec can be implemented in Python. There is no need to modify the interpreter core.

def decode_cesu8(b):
    return re.sub('[\uD800-\uDBFF][\uDC00\DFFF]', lambda m: chr(0x10000 | ((ord(m.group()[0]) & 0x3FF) << 10) | (ord(m.group()[1]) & 0x3FF)), b.decode('utf-8', 'surrogatepass'))

def encode_cesu8(s):
    return re.sub('[\U00010000-\U0010FFFF]', lambda m: chr(0xD800 | ((ord(m.group()) >> 10) & 0x3FF)) + chr(0xDC00 | (ord(m.group() & 0x3FF)), s).encode('utf-8', 'surrogatepass')

def decode_mutf8(b):
    return decode_cesu8(b.replace(b'\xC0\x80', b'\x00'))

def encode_mutf8(s):
    return encode_cesu8(s).replace(b'\x00', b'\xC0\x80')
History
Date User Action Args
2012-04-24 11:01:37serhiy.storchakasetrecipients: + serhiy.storchaka, lemburg, georg.brandl, phr, belopolsky, moese, vstinner, ezio.melotti, tchrist
2012-04-24 11:01:37serhiy.storchakasetmessageid: <1335265297.03.0.881571661115.issue2857@psf.upfronthosting.co.za>
2012-04-24 11:01:36serhiy.storchakalinkissue2857 messages
2012-04-24 11:01:36serhiy.storchakacreate