Author scoder
Recipients eli.bendersky, flox, jcea, ncoghlan, python-dev, scoder
Date 2013-08-25.17:50:10
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1377453011.5.0.0706242105167.issue17741@psf.upfronthosting.co.za>
In-reply-to
Content
> Also, even if the new approach is implemented in the next release,
IncrementalParser can stay as a simple synonym to
XMLParser(target=EventBuilder(...)).

No it can't. According to your signature, it accepts a parser instance as input. So it can't just create a new one internally.


"""
This is an existing API within ET:

  xml.etree.ElementTree.iterparse(source, events=None, parser=None)

The parser argument is limited: it can't use a custom TreeBuilder. It also
must provide _setevents, so either inherit from XMLParser or just provide
that method in another way [I'll try to improve the documentation to make
all of this explicit]. Whatever we say in this issue, iterparse is here to
stay.
"""

I agree. However, I can't see why this should constrain us when adding a new API. Sure, the current implementation puts certain restrictions on the input. But I see that as an advantage, not as a constraint. It's currently broken in what it accepts as input, so we are free to change that aspect later on. And we can now try to get it right for the new API that we add.

Also, AFAICT, passing a parser into iterparse() is a very rare thing to do, if only because it doesn't really work, so the likelyhood of breaking code is very limited (I'm not saying we should, just thinking about the impact). Note that lxml.etree doesn't currently support this at all.


Just to throw some more ideas into the wild, what do you think about making this an inherent feature of the XMLParser class itself? That can be done now, without major changes to the underlying code. It could be enabled by a new keyword argument "collect_events=None" that you could pass into the constructor in order to define the set of events. The "events()" method would always be available, but would only yield events if there actually are any. Otherwise, it would just terminate and be done. So, unless you pass it a set of event names at init time, the parser wouldn't collect events and you wouldn't get any out of it. Sounds like a clean interface to me.

Collecting the events would work in the same way as it currently works, i.e. it would call the target's start() method, and whatever that returns will be added as value to the events as a tuple ("start", target_return_value). That means that it's up to the parser target to determine what the event iteration returns. Can be elements, can be None, can be something else.

As for backwards compatibility with iterparse(), I don't see a problem with officially documenting iterparse() as requiring an XMLParser if a parser is provided, and as reconfiguring that parser to its needs. It already does that anyway and it currently doesn't work for anything but an XMLParser.


Also, on a related note, we might (!) consider the addition of the _setevents() method a bug in CPython 3.3. It didn't exist in ElementTree before it was merged with cElementTree. Instead, the expected interface was (apparently) that the parser that is passed in has a "_parser" attribute that holds an Expat parser, and that iterparse() could reconfigure according for itself. Both are not public APIs and even differed between ET and cET, so I don't see any harm being done, but it would be good to have one way to do it. OTOH, if that way is _setevents(), then be it. The advantage would be that this interface is at least independent of Expat and could potentially be implemented by other types of parsers.
History
Date User Action Args
2013-08-25 17:50:11scodersetrecipients: + scoder, jcea, ncoghlan, eli.bendersky, flox, python-dev
2013-08-25 17:50:11scodersetmessageid: <1377453011.5.0.0706242105167.issue17741@psf.upfronthosting.co.za>
2013-08-25 17:50:11scoderlinkissue17741 messages
2013-08-25 17:50:10scodercreate