Message181014
xmlparser.Parse() works with string data only if XML encoding is utf-8 (or ascii). Examples:
>>> import xml.parsers.expat
>>> parser = xml.parsers.expat.ParserCreate()
>>> content = []
>>> parser.CharacterDataHandler = content.append
>>> parser.Parse("<?xml version='1.0' encoding='utf-8'?><tag>\xb5</tag>")
1
>>> content
['µ']
>>> parser = xml.parsers.expat.ParserCreate()
>>> content = []
>>> parser.CharacterDataHandler = content.append
>>> parser.Parse("<?xml version='1.0' encoding='iso8859'?><tag>\xb5</tag>")
1
>>> content
['µ']
>>> parser = xml.parsers.expat.ParserCreate()
>>> content = []
>>> parser.CharacterDataHandler = content.append
>>> parser.Parse("<?xml version='1.0' encoding='utf-16'?><tag>\xb5</tag>")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
xml.parsers.expat.ExpatError: encoding specified in XML declaration is incorrect: line 1, column 30
This affects all other modules which works with XML: xml.sax, xml.dom.minidom, xml.dom.pulldom, xml.etree.ElementTree.
Here is a patch which fixes parsing string data with non-UTF-8 XML. |
|
Date |
User |
Action |
Args |
2013-01-31 10:01:19 | serhiy.storchaka | set | recipients:
+ serhiy.storchaka, ezio.melotti |
2013-01-31 10:01:19 | serhiy.storchaka | set | messageid: <1359626479.25.0.87229024986.issue17089@psf.upfronthosting.co.za> |
2013-01-31 10:01:19 | serhiy.storchaka | link | issue17089 messages |
2013-01-31 10:01:18 | serhiy.storchaka | create | |
|