This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author jcsalterego
Recipients Neil Muller, effbot, hodgestar, jcsalterego, pitrou
Date 2009-06-23.05:37:21
SpamBayes Score 1.1415285e-10
Marked as misclassified No
Message-id <1245735447.77.0.911688551388.issue6233@psf.upfronthosting.co.za>
In-reply-to
Content
The attached patch includes Neil's original additions to test_xml_etree.py.


I also noticed that _encode_entity wasn't being called in ElementTree in
py3k, with the important bit being the nested function
escape_entities(), in conjunction with _escape and _escape_map.

In 2.x, _encode_entity() is used after _encode() throws Unicode
exceptions [1], so I figured it would make sense to take the core
functionality of _escape_entities() and integrate it into _encode in the
same fashion -- when an exception is thrown.

Basically, I:
- changed _escape regexp from using "[\x0080-\uffff]" to "[\x80-xff]"
- extracted _encode_entity.escape_entities() and made it
_escape_entities of module scope
- removed _encode_entity()
- added UnicodeEncodeError exception in _encode()

I'm not sure what the expected outcome is supposed to be when the text
is not type bytes but str. With this patch, the output has
b"t&#195;&#163;t" rather than b"t&#227;t".

Hope this is a step in the right direction.

[1] ElementTree.py:814, ElementTree.py:829, python 2.7 HEAD r50941
History
Date User Action Args
2009-06-23 05:37:28jcsalteregosetrecipients: + jcsalterego, effbot, pitrou, hodgestar, Neil Muller
2009-06-23 05:37:27jcsalteregosetmessageid: <1245735447.77.0.911688551388.issue6233@psf.upfronthosting.co.za>
2009-06-23 05:37:25jcsalteregolinkissue6233 messages
2009-06-23 05:37:23jcsalteregocreate