Message198965
Any update on this? Just so you can see what my work around is, I'll paste in the code I'm using. The major issue I have with this is that performance doesn't scale to large strings.
This is also a bytes-to-bytes or str-to-str encoding, because this is the type of operation that one plans to do with the data one has.
Having a full fledged streaming codec to handle this would be very helpful when writing applications that stream tab and newline separated utf-8 data over stdin/stdout.
text_types = (str, )
escape_tm = dict((k, repr(chr(k))[1:-1]) for k in range(32))
escape_tm[0] = '\0'
escape_tm[7] = '\a'
escape_tm[8] = '\b'
escape_tm[11] = '\v'
escape_tm[12] = '\f'
escape_tm[ord('\\')] = '\\\\'
def escape_control(s):
if isinstance(s, text_types):
return s.translate(escape_tm)
else:
return s.decode('utf-8', 'surrogateescape').translate(escape_tm).encode('utf-8', 'surrogateescape')
def unescape_control(s):
if isinstance(s, text_types):
return s.encode('latin1', 'backslashreplace').decode('unicode_escape')
else:
return s.decode('utf-8', 'surrogateescape').encode('latin1', 'backslashreplace').decode('unicode_escape').encode('utf-8', 'surrogateescape') |
|
Date |
User |
Action |
Args |
2013-10-04 20:54:35 | underrun | set | recipients:
+ underrun, r.david.murray, serhiy.storchaka |
2013-10-04 20:54:34 | underrun | set | messageid: <1380920074.97.0.858138278355.issue18679@psf.upfronthosting.co.za> |
2013-10-04 20:54:34 | underrun | link | issue18679 messages |
2013-10-04 20:54:34 | underrun | create | |
|