Message 198965 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	underrun
Recipients	r.david.murray, serhiy.storchaka, underrun
Date	2013-10-04.20:54:34
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1380920074.97.0.858138278355.issue18679@psf.upfronthosting.co.za>
In-reply-to

Content
Any update on this? Just so you can see what my work around is, I'll paste in the code I'm using. The major issue I have with this is that performance doesn't scale to large strings. This is also a bytes-to-bytes or str-to-str encoding, because this is the type of operation that one plans to do with the data one has. Having a full fledged streaming codec to handle this would be very helpful when writing applications that stream tab and newline separated utf-8 data over stdin/stdout. text_types = (str, ) escape_tm = dict((k, repr(chr(k))[1:-1]) for k in range(32)) escape_tm[0] = '\0' escape_tm[7] = '\a' escape_tm[8] = '\b' escape_tm[11] = '\v' escape_tm[12] = '\f' escape_tm[ord('\\')] = '\\\\' def escape_control(s): if isinstance(s, text_types): return s.translate(escape_tm) else: return s.decode('utf-8', 'surrogateescape').translate(escape_tm).encode('utf-8', 'surrogateescape') def unescape_control(s): if isinstance(s, text_types): return s.encode('latin1', 'backslashreplace').decode('unicode_escape') else: return s.decode('utf-8', 'surrogateescape').encode('latin1', 'backslashreplace').decode('unicode_escape').encode('utf-8', 'surrogateescape')

Any update on this? Just so you can see what my work around is, I'll paste in the code I'm using. The major issue I have with this is that performance doesn't scale to large strings.

This is also a bytes-to-bytes or str-to-str encoding, because this is the type of operation that one plans to do with the data one has.

Having a full fledged streaming codec to handle this would be very helpful when writing applications that stream tab and newline separated utf-8 data over stdin/stdout.
                                                                                                                  
text_types = (str, )                                                      

escape_tm = dict((k, repr(chr(k))[1:-1]) for k in range(32))              
escape_tm[0] = '\0'                                                            
escape_tm[7] = '\a'                                                            
escape_tm[8] = '\b'                                                            
escape_tm[11] = '\v'                                                           
escape_tm[12] = '\f'                                                           
escape_tm[ord('\\')] = '\\\\'

def escape_control(s):                                                          
    if isinstance(s, text_types):                                               
        return s.translate(escape_tm)
    else:
        return s.decode('utf-8', 'surrogateescape').translate(escape_tm).encode('utf-8', 'surrogateescape')

def unescape_control(s):                                                        
    if isinstance(s, text_types):                                               
        return s.encode('latin1', 'backslashreplace').decode('unicode_escape')
    else:                                                                       
        return s.decode('utf-8', 'surrogateescape').encode('latin1', 'backslashreplace').decode('unicode_escape').encode('utf-8', 'surrogateescape')

History
Date	User	Action	Args
2013-10-04 20:54:35	underrun	set	recipients: + underrun, r.david.murray, serhiy.storchaka
2013-10-04 20:54:34	underrun	set	messageid: <1380920074.97.0.858138278355.issue18679@psf.upfronthosting.co.za>
2013-10-04 20:54:34	underrun	link	issue18679 messages
2013-10-04 20:54:34	underrun	create