Author amaury.forgeotdarc
Recipients amaury.forgeotdarc, ezio.melotti, flox, lemburg, r.david.murray, rhansen
Date 2010-01-25.13:54:00
SpamBayes Score 4.64976e-06
Marked as misclassified No
Message-id <1264427643.29.0.225238610182.issue7615@psf.upfronthosting.co.za>
In-reply-to
Content
I feel uneasy to change the default unicode-escape encoding.
I think that we mix two features here; to transfer a unicode string between two points, programs must agree on where the data ends, and how characters are represented as bytes.
All codecs including unicode-escape only dealt with byte conversion; (unicode-escape converts everything to printable 7bit ascii);
these patches want to add a feature related to the "where does the string end" issue, and is only aimed at "python code" containers. Other transports and protocols may choose different delimiters.

My point is that unicode-escape used to not change printable 7-bit ascii characters, and the patches will change this.

And actually this will break existing code. It did not take me long to find two examples of programs which embed unicode_escape-encoded text between quotes, and take care themselves of escaping quotes. First example generates javascript code, the second generates SQL statements:
http://github.com/chriseppstein/pywebmvc/blob/master/src/code/pywebmvc/tools/searchtool.py#L450
http://gitweb.sabayon.org/?p=entropy.git;a=blob;f=libraries/entropy/db/__init__.py;h=2d818455efa347f35b2e96d787fefd338055d066;hb=HEAD#l6463

This does not prevent the creation of a new codec, let's call it 'python-unicode-escape' [ or 'repr' :-) ]
History
Date User Action Args
2010-01-25 13:54:03amaury.forgeotdarcsetrecipients: + amaury.forgeotdarc, lemburg, ezio.melotti, r.david.murray, flox, rhansen
2010-01-25 13:54:03amaury.forgeotdarcsetmessageid: <1264427643.29.0.225238610182.issue7615@psf.upfronthosting.co.za>
2010-01-25 13:54:01amaury.forgeotdarclinkissue7615 messages
2010-01-25 13:54:00amaury.forgeotdarccreate