Message 119184 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	Maciek.J
Recipients	Maciek.J
Date	2010-10-20.01:43:18
SpamBayes Score	5.7786553e-12
Marked as misclassified	No
Message-id	<1287539001.05.0.466094845115.issue10149@psf.upfronthosting.co.za>
In-reply-to

Content
Not sure if this is a Python problem or an expat problem, but I get truncated data while parsing XML documents. This particular project is for parsing an XML file of Wikipedia dump. The attached files are: * xml-parse-revisions.py - parser script * revision-test.xml - input XML * revision-test.xml.sql - output XML * revision_create.sql - not really needed for this test case, but attached for completeness You can notice that the output file sometimes contains too short values for the "timestamp". Also note that if you add whitespace to the input XML, then different timestamps will be truncated. My Python is 2.6.6.

Not sure if this is a Python problem or an expat problem, but I get truncated data while parsing XML documents.

This particular project is for parsing an XML file of Wikipedia dump.

The attached files are:
* xml-parse-revisions.py - parser script
* revision-test.xml - input XML
* revision-test.xml.sql - output XML
* revision_create.sql - not really needed for this test case, but attached for completeness

You can notice that the output file sometimes contains too short values for the "timestamp". Also note that if you add whitespace to the input XML, then different timestamps will be truncated.

My Python is 2.6.6.

History
Date	User	Action	Args
2010-10-20 01:43:21	Maciek.J	set	recipients: + Maciek.J
2010-10-20 01:43:21	Maciek.J	set	messageid: <1287539001.05.0.466094845115.issue10149@psf.upfronthosting.co.za>
2010-10-20 01:43:18	Maciek.J	link	issue10149 messages
2010-10-20 01:43:18	Maciek.J	create