Message 107537 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	hniksic
Recipients	effbot, hniksic
Date	2010-06-11.08:54:29
SpamBayes Score	1.6744324e-06
Marked as misclassified	No
Message-id	<1276246471.53.0.74986551959.issue2892@psf.upfronthosting.co.za>
In-reply-to

Content
Here is a small test case that demonstrates the problem, expected behavior and actual behavior: {{{ for ev in xml.etree.cElementTree.iterparse(StringIO('<x></x>rubbish'), events=('start', 'end')): print ev }}} The above code should first print the two events (start and end), and then raise the exception. In Python 2.7 it runs like this: {{{ >>> for ev in xml.etree.cElementTree.iterparse(StringIO('<x></x>rubbish'), events=('start', 'end')): ... print ev ... Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<string>", line 84, in next cElementTree.ParseError: junk after document element: line 1, column 7 }}} Expected behavior, obtained with my patch, is that it runs like this: {{{ >>> for ev in my_iterparse(StringIO('<x></x>rubbish'), events=('start', 'end')): ... print ev ... ('start', <Element 'x' at 0xb771cba8>) ('end', <Element 'x' at 0xb771cba8>) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 26, in __iter__ cElementTree.ParseError: junk after document element: line 1, column 7 }}} The difference is, of course, only visible when printing events. A side-effect-free operation, such as building a list using list(iterparse(...)) would behave exactly the same before and after the change.

Here is a small test case that demonstrates the problem, expected behavior and actual behavior:

{{{
for ev in xml.etree.cElementTree.iterparse(StringIO('<x></x>rubbish'), events=('start', 'end')):
    print ev
}}}

The above code should first print the two events (start and end), and then raise the exception.  In Python 2.7 it runs like this:

{{{
>>> for ev in xml.etree.cElementTree.iterparse(StringIO('<x></x>rubbish'), events=('start', 'end')):
...   print ev
... 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<string>", line 84, in next
cElementTree.ParseError: junk after document element: line 1, column 7
}}}

Expected behavior, obtained with my patch, is that it runs like this:

{{{
>>> for ev in my_iterparse(StringIO('<x></x>rubbish'), events=('start', 'end')):
...  print ev
... 
('start', <Element 'x' at 0xb771cba8>)
('end', <Element 'x' at 0xb771cba8>)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 26, in __iter__
cElementTree.ParseError: junk after document element: line 1, column 7
}}}

The difference is, of course, only visible when printing events.  A side-effect-free operation, such as building a list using list(iterparse(...)) would behave exactly the same before and after the change.

History
Date	User	Action	Args
2010-06-11 08:54:31	hniksic	set	recipients: + hniksic, effbot
2010-06-11 08:54:31	hniksic	set	messageid: <1276246471.53.0.74986551959.issue2892@psf.upfronthosting.co.za>
2010-06-11 08:54:30	hniksic	link	issue2892 messages
2010-06-11 08:54:29	hniksic	create