Message 117999 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	vojta.rylko
Recipients	vojta.rylko
Date	2010-10-05.10:17:29
SpamBayes Score	0.0003823133
Marked as misclassified	No
Message-id	<1286273854.3.0.24375174829.issue10026@psf.upfronthosting.co.za>
In-reply-to

Content
Hi, I have file with 10 000 records of same element item (always same): $ head test.xml <channel> <item><section>Twitter</section></item> <item><section>Twitter</section></item> <item><section>Twitter</section></item> <item><section>Twitter</section></item> <item><section>Twitter</section></item> <item><section>Twitter</section></item> <item><section>Twitter</section></item> <item><section>Twitter</section></item> <item><section>Twitter</section></item> And run simply program for printing content of element section: $ python pulldom.py test.xml \| head Twitter Twitter Twitter Twitter Twitter Twitter Twitter Twitter Twitter Twitter Seems work fine: $ python pulldom.py test.xml \| wc -l 10000 But (in two cases of 10 000) gives me just "Twi" not Twitter: $ python pulldom.py test.xml \| grep -v Twitter Twi Twi Why? This example program demonstrate big problems in my real application - xml.dom.pulldom is cutting content of some elements. Thanks for any advice Vojta Rylko --------------------------- Python 2.5.4 (r254:67916, Feb 10 2009, 14:58:09) [GCC 4.2.4] on linux2 --------------------------- pulldom.py: --------------------------- file=open(sys.argv[1]) events = pulldom.parse(file) for event, node in events: if event == pulldom.START_ELEMENT: if node.tagName == 'item': events.expandNode(node) print node.getElementsByTagName('section').item(0).firstChild.data

Hi,

I have file with 10 000 records of same element item (always same):

$ head test.xml
<channel>
<item><section>Twitter</section></item>
<item><section>Twitter</section></item>
<item><section>Twitter</section></item>
<item><section>Twitter</section></item>
<item><section>Twitter</section></item>
<item><section>Twitter</section></item>
<item><section>Twitter</section></item>
<item><section>Twitter</section></item>
<item><section>Twitter</section></item>

And run simply program for printing content of element section:

$ python pulldom.py test.xml | head
Twitter
Twitter
Twitter
Twitter
Twitter
Twitter
Twitter
Twitter
Twitter
Twitter

Seems work fine:
$ python pulldom.py test.xml | wc -l
10000

But (in two cases of 10 000) gives me just "Twi" not Twitter:
$ python pulldom.py test.xml  | grep -v Twitter
Twi
Twi 


Why? This example program demonstrate big problems in my real application - xml.dom.pulldom is cutting content of some elements.

Thanks for any advice
Vojta Rylko

---------------------------
Python 2.5.4 (r254:67916, Feb 10 2009, 14:58:09)
[GCC 4.2.4] on linux2
---------------------------
pulldom.py:
---------------------------
file=open(sys.argv[1])
events = pulldom.parse(file)

for event, node in events:
        if event == pulldom.START_ELEMENT:
                if node.tagName == 'item':
                        events.expandNode(node)
                        print node.getElementsByTagName('section').item(0).firstChild.data

History
Date	User	Action	Args
2010-10-05 10:17:34	vojta.rylko	set	recipients: + vojta.rylko
2010-10-05 10:17:34	vojta.rylko	set	messageid: <1286273854.3.0.24375174829.issue10026@psf.upfronthosting.co.za>
2010-10-05 10:17:32	vojta.rylko	link	issue10026 messages
2010-10-05 10:17:30	vojta.rylko	create