Message 194690 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	underrun
Recipients	r.david.murray, serhiy.storchaka, underrun
Date	2013-08-08.16:23:02
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1375978983.12.0.496269878875.issue18679@psf.upfronthosting.co.za>
In-reply-to

Content
> ast.literal_eval("'%s'" % e) this doesn't work if you use the wrong quote. without introspecting the data in e you can't reliably choose whether to use "'%s'" '"%s"' '"""%s"""' or "'''%s'''". which ones break (and break siliently) depend on the data. > e.encode().decode('unicode-escape').encode('latin1').decode() so ... encode the repr()[1:-1] string in utf-8 bytes, decode backslash escape sequences and individual bytes as if they are latin1, encode as latin1 (which is just byte for byte serialization), then decode the byte representation as if it is utf-8 encoded to recombine the characters that were broken with the 'unicode-escape' decode earlier? this may work for my example, but this looks and feels very hacky for something that should be simple and straight forward. and again tools other than python will run into escaped quotes in the data which may cause problems. > e.encode('latin1', 'backslashescape').decode('unicode-escape') when i execute this i get a traceback LookupError: unknown error handler name 'backslashescape'

> ast.literal_eval("'%s'" % e)

this doesn't work if you use the wrong quote. without introspecting the data in e you can't reliably choose whether to use "'%s'" '"%s"' '"""%s"""' or "'''%s'''". which ones break (and break siliently) depend on the data.


> e.encode().decode('unicode-escape').encode('latin1').decode()

so ... encode the repr()[1:-1] string in utf-8 bytes, decode backslash escape sequences and individual bytes as if they are latin1, encode as latin1 (which is just byte for byte serialization), then decode the byte representation as if it is utf-8 encoded to recombine the characters that were broken with the 'unicode-escape' decode earlier? 

this may work for my example, but this looks and feels very hacky for something that should be simple and straight forward. and again tools other than python will run into escaped quotes in the data which may cause problems.

> e.encode('latin1', 'backslashescape').decode('unicode-escape')

when i execute this i get a traceback

LookupError: unknown error handler name 'backslashescape'

History
Date	User	Action	Args
2013-08-08 16:23:03	underrun	set	recipients: + underrun, r.david.murray, serhiy.storchaka
2013-08-08 16:23:03	underrun	set	messageid: <1375978983.12.0.496269878875.issue18679@psf.upfronthosting.co.za>
2013-08-08 16:23:03	underrun	link	issue18679 messages
2013-08-08 16:23:02	underrun	create