This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author gianzula
Recipients gianzula
Date 2010-07-13.09:04:33
SpamBayes Score 6.1130275e-05
Marked as misclassified No
Message-id <1279011876.88.0.998505025151.issue9241@psf.upfronthosting.co.za>
In-reply-to
Content
When parsing a UTF-16 little-endian encoded XML file containing some japanese characters, the xml.sax.parse function raises a SAXParseException exception saying "no element found". Problem arises with/on:

Python 2.5.2/Windows XP Pro SP3 32 bit
Python 2.6.4/Windows XP Pro SP3 32 bit
Python 2.5.2/Windows 2008 Server SP2 64 bit

The same file is successfully processed with/on:

Python 2.4.3/CentOS 5.4
Python 2.6.3/CentOS 5.4

I've attached a minimal XML file that contains a single U+FF1A japanese character that triggers the exception. Code for parsing the file follows:

import xml.sax
xml.sax.parse(open("ff1a.xml"), xml.sax.ContentHandler())

Best regards,
Gianfranco
History
Date User Action Args
2010-07-13 09:04:36gianzulasetrecipients: + gianzula
2010-07-13 09:04:36gianzulasetmessageid: <1279011876.88.0.998505025151.issue9241@psf.upfronthosting.co.za>
2010-07-13 09:04:34gianzulalinkissue9241 messages
2010-07-13 09:04:34gianzulacreate