Message117999
Hi,
I have file with 10 000 records of same element item (always same):
$ head test.xml
<channel>
<item><section>Twitter</section></item>
<item><section>Twitter</section></item>
<item><section>Twitter</section></item>
<item><section>Twitter</section></item>
<item><section>Twitter</section></item>
<item><section>Twitter</section></item>
<item><section>Twitter</section></item>
<item><section>Twitter</section></item>
<item><section>Twitter</section></item>
And run simply program for printing content of element section:
$ python pulldom.py test.xml | head
Twitter
Twitter
Twitter
Twitter
Twitter
Twitter
Twitter
Twitter
Twitter
Twitter
Seems work fine:
$ python pulldom.py test.xml | wc -l
10000
But (in two cases of 10 000) gives me just "Twi" not Twitter:
$ python pulldom.py test.xml | grep -v Twitter
Twi
Twi
Why? This example program demonstrate big problems in my real application - xml.dom.pulldom is cutting content of some elements.
Thanks for any advice
Vojta Rylko
---------------------------
Python 2.5.4 (r254:67916, Feb 10 2009, 14:58:09)
[GCC 4.2.4] on linux2
---------------------------
pulldom.py:
---------------------------
file=open(sys.argv[1])
events = pulldom.parse(file)
for event, node in events:
if event == pulldom.START_ELEMENT:
if node.tagName == 'item':
events.expandNode(node)
print node.getElementsByTagName('section').item(0).firstChild.data |
|
Date |
User |
Action |
Args |
2010-10-05 10:17:34 | vojta.rylko | set | recipients:
+ vojta.rylko |
2010-10-05 10:17:34 | vojta.rylko | set | messageid: <1286273854.3.0.24375174829.issue10026@psf.upfronthosting.co.za> |
2010-10-05 10:17:32 | vojta.rylko | link | issue10026 messages |
2010-10-05 10:17:30 | vojta.rylko | create | |
|