This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: cElementTree iterparse requires events as bytes; ElementTree uses strings
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.2, Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: amaury.forgeotdarc, eli.bendersky, eric-talevich, eric.araujo, ezio.melotti, flox, maubp, python-dev, sandro.tosi, vstinner
Priority: normal Keywords:

Created on 2010-07-14 03:27 by eric-talevich, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (12)
msg110252 - (view) Author: Eric Talevich (eric-talevich) Date: 2010-07-14 03:27
In xml.etree, ElementTree and cElementTree implement different interfaces for the iterparse function/class.

In ElementTree, the argument "events" must be a tuple of strings:

from xml.etree import ElementTree as ET
for result in ET.iterparse('example.xml', events=('start', 'end')):
    print(result)

That works, given a valid XML file 'example.xml'. If the event names are given as bytes instead of strings (b'start', b'end'), there's no crash, but no events are recognized.

In cElementTree, it's the opposite: the events argument must be a tuple of bytes:

from xml.etree import cElementTree as cET
for result in cET.iterparse('example.xml', events=(b'start', b'end')):
    print(result)

Giving a tuple of strings instead of bytes results in:

>>> for result in cET.iterparse('example.xml', events=('start', 'end')):
...     print(result)
TypeError: invalid event tuple


This makes it difficult to use ElementTree as a backup for cElementTree, or at least very awkward.
msg110574 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2010-07-17 16:16
It seems that this has been fixed in the py3k branch (r78942). Now both bytes and unicode are accepted. Can someone check?
msg113739 - (view) Author: Eric Talevich (eric-talevich) Date: 2010-08-13 01:40
This bug seems to be still present in Python 3.1.2. (Unless I'm doing something wrong.) Was r78942 included in the 3.1.2 release?
msg114042 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2010-08-16 10:04
No, apparently, r78942 was not included in 3.1.2.
msg126287 - (view) Author: Peter (maubp) Date: 2011-01-14 19:01
This wasn't fixed in Python 3.1.3 either.

Is the trunk commit Amaury identified from py3k branch (r78942) suitable to back port to Python 3.1.x?
msg126290 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2011-01-14 19:15
r78942 is quite large unfortunately.
But just patching _elementree.c::xmlparser_setevents() should be possible.
This would at least fix the "invalid event tuple" error.
msg152837 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2012-02-08 03:45
At this point, 3.1 won't be fixed with such changes any longer.

Is this fixed in 3.2/3.3?
msg152864 - (view) Author: Eric Talevich (eric-talevich) Date: 2012-02-08 14:42
It's more-or-less fixed in Python 3.2:

- With cElementTree, both bytes and strings are accepted for events; 

- With ElementTree, only strings are accepted, and bytes raise a ValueError (unknown event).

A small inconsistency remains, but it's fine to just use strings in all cases.
msg152926 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2012-02-09 04:21
Eric,

Thanks for checking. I agree that this behavior is acceptable, but a documentation fix would be appropriate. The documentation of iterparse should mention the events it accepts, also saying that those are strings. 

The events are listed at http://effbot.org/zone/element-iterparse.htm

Would you like to try your hand at submitting a patch for Python 3.2? I will review and apply it to 3.2 and 3.3
msg152927 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2012-02-09 04:22
Changing the target version(s) and adding some documentation experts to the nosy list
msg153072 - (view) Author: Eric Talevich (eric-talevich) Date: 2012-02-10 18:46
Well, this is not the best month for me to try digging into a new codebase... I would not mind if someone else did the patch for this.
msg155999 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012-03-16 06:44
New changeset 84e4d76bd146 by Eli Bendersky in branch '3.2':
Issue #9257: clarify the events iterparse accepts
http://hg.python.org/cpython/rev/84e4d76bd146

New changeset 00c7142ee54a by Eli Bendersky in branch 'default':
Issue #9257: clarify the events iterparse accepts
http://hg.python.org/cpython/rev/00c7142ee54a
History
Date User Action Args
2022-04-11 14:57:03adminsetgithub: 53503
2012-03-16 06:44:40eli.benderskysetstatus: open -> closed
resolution: fixed
stage: needs patch -> resolved
2012-03-16 06:44:20python-devsetnosy: + python-dev
messages: + msg155999
2012-02-10 18:46:05eric-talevichsetmessages: + msg153072
2012-02-09 04:22:37eli.benderskysetnosy: + ezio.melotti, eric.araujo, sandro.tosi

messages: + msg152927
versions: + Python 3.2, Python 3.3, - Python 3.1
2012-02-09 04:21:40eli.benderskysetmessages: + msg152926
2012-02-08 14:42:49eric-talevichsetmessages: + msg152864
2012-02-08 03:45:45eli.benderskysetnosy: + eli.bendersky
messages: + msg152837
2011-11-08 22:48:17ezio.melottisetnosy: + flox
type: behavior
2011-01-18 21:43:43vstinnersetnosy: + vstinner
2011-01-14 19:15:31amaury.forgeotdarcsetmessages: + msg126290
2011-01-14 19:01:55maubpsetmessages: + msg126287
2010-08-16 10:04:43amaury.forgeotdarcsetmessages: + msg114042
2010-08-13 01:40:35eric-talevichsetmessages: + msg113739
2010-07-23 10:03:30maubpsetnosy: + maubp
2010-07-17 16:16:56amaury.forgeotdarcsetnosy: + amaury.forgeotdarc

messages: + msg110574
stage: needs patch
2010-07-14 03:27:45eric-talevichcreate