This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author ezio.melotti
Recipients ezio.melotti, hodgestar, r.david.murray, sergiomb2, wiget, yanne, zchyla
Date 2011-11-06.22:06:55
SpamBayes Score 2.058409e-12
Marked as misclassified No
Message-id <1320617216.39.0.465373636123.issue3932@psf.upfronthosting.co.za>
In-reply-to
Content
I'm not sure what is the best solution here.

unescape uses a regex with replaceEntities as callback to replace the entities in attribute values.
The problem is that replaceEntities currently returns unicode, and if unescape receives a str, an automatic coercion to unicode happens and an error is raised whenever the str is non-ascii.

The possible solutions are:
 1) Document the status quo (i.e replaceEntities always returns unicode, and an error is raised whenever a string that contains non-ascii chars is passed);
 2) Change replaceEntities to return str only for ascii chars (as the patch proposed by Zbigniew does).  This works as long as the entity resolves to an ascii character, but keep failing for the other cases.

The first option is cleaner, and means that if you want to parse something you should always use unicode, otherwise it might fail (In case of ambiguity, refuse the temptation to guess).
The second option might allow you to parse a few more documents without converting them to unicode, but only if you are lucky (i.e. you don't get any unicode mixed with non-ascii str).  If most of the entities in attributes resolve to ascii (e.g. &quote; &amp; &apos; &gt; &lt;), it might be more practical to return str and avoid unnecessary errors, while still adding a note in documentation that passing unicode is better.
History
Date User Action Args
2011-11-06 22:06:56ezio.melottisetrecipients: + ezio.melotti, wiget, hodgestar, yanne, r.david.murray, zchyla, sergiomb2
2011-11-06 22:06:56ezio.melottisetmessageid: <1320617216.39.0.465373636123.issue3932@psf.upfronthosting.co.za>
2011-11-06 22:06:55ezio.melottilinkissue3932 messages
2011-11-06 22:06:55ezio.melotticreate