Message 196220 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	eli.bendersky
Recipients	eli.bendersky, flox, jcea, jkloth, ncoghlan, python-dev, scoder
Date	2013-08-26.16:03:47
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<CAF-Rda-RGYYyxQBw=zCDGYT6pqQLhSoQq=oKAd=9dRpkRJtjmQ@mail.gmail.com>
In-reply-to	<1377495644.65.0.193687256987.issue17741@psf.upfronthosting.co.za>

Content
On Sun, Aug 25, 2013 at 10:40 PM, Stefan Behnel <report@bugs.python.org>wrote: > > Stefan Behnel added the comment: > > Hmm, did you look at my last comment at all? It solves both the technical > issues and the API issues very nicely and avoids any problems of potential > future changes. Let me quickly explain why. > > The feature in question depends on two existing parts of the API: the > event generation of the parser, and the return values of the parser target > (e.g. a tree builder). So there are really only three places where this > feature makes sense, both technically and API-wise. > > 1) in the target > 2) in the parser > 3) between parser and target > > Note how a separate class is ruled out right from the start by the fact > that the feature lives somehwere between parser and target. It's an > inherent part of the existing design already (and of the implementation, > BTW), so I don't see how adding a separate thing to control it makes any > sense. > > 1) is impossible because the target is user provided and we do not control > it > 2) works fine because the parser controls both the call to the target and > its return value > 3) would be nice (and was my original favourite) but is hard to do with > the current implementation and requires further changes to the API of > parser targets > > So 2) is the choice that remains. > > I think folding it all into XMLParser is a bad idea. XMLParser is a fairly simple API and I don't want to complicate it. But more importantly, XMLParser knows nothing about Elements, at least in the direct API of today. The one constructing Elements is the target. The "read_events" method proposed for the new class (currently IncrementalParser.events) already returns Elements, having used a TreeBuilder to build them. XMLParser emits start/end/data calls into the target, but these only carry tag names, attributes and chunks of data. The hierarchical element construction is done by TreeBuilder. What I actually think would be better for the long term is to add new target invocations in XMLParser - start-ns and end-ns. So XMLParser would just keep parsing, leaving the interpretation of the parsed data to the target. Today's TreeBuilder is free to ignore these calls. A custom "EventCollectingTreeBuilder" can collect an event list, having all the information at its disposal. Thus, XMLParser would remain what it is today (minus the _setevents hack) - a router for pyexpat events. These discussions of the future API are interesting, but what's more important today is to have an API for IncrementalParser (using this name before a new one is agreed upon) that doesn't block future implementation changes. And I believe the API proposed here fits the bill. > > The class will be named EventParser. > > Obviously because it's parsing Events, as opposed to the XMLParser, which > parses XML, or the HTMLParser, which parses HTML, right? > The name is not perfect, and proposals for a better one are welcome. FWIW, since it already lives in the xml.etree namespace, "XML" does not necessarily have to be part of the name. So, some alternatives: * EventStreamer - proposed by Nick. I have to admit I don't feel good with it, because I still want to be crystal clear it's a parser we're talking about. * EventBasedParser * EventCollectingParser * NonblockingParser * ... other ideas?

On Sun, Aug 25, 2013 at 10:40 PM, Stefan Behnel <report@bugs.python.org>wrote:

>
> Stefan Behnel added the comment:
>
> Hmm, did you look at my last comment at all? It solves both the technical
> issues and the API issues very nicely and avoids any problems of potential
> future changes. Let me quickly explain why.
>
> The feature in question depends on two existing parts of the API: the
> event generation of the parser, and the return values of the parser target
> (e.g. a tree builder). So there are really only three places where this
> feature makes sense, both technically and API-wise.
>
> 1) in the target
> 2) in the parser
> 3) between parser and target
>
> Note how a separate class is ruled out right from the start by the fact
> that the feature lives somehwere between parser and target. It's an
> inherent part of the existing design already (and of the implementation,
> BTW), so I don't see how adding a separate thing to control it makes any
> sense.
>
> 1) is impossible because the target is user provided and we do not control
> it
> 2) works fine because the parser controls both the call to the target and
> its return value
> 3) would be nice (and was my original favourite) but is hard to do with
> the current implementation and requires further changes to the API of
> parser targets
>
> So 2) is the choice that remains.
>
>
I think folding it all into XMLParser is a bad idea. XMLParser is a fairly
simple API and I don't want to complicate it. But more importantly,
XMLParser knows nothing about Elements, at least in the direct API of
today. The one constructing Elements is the target. The "read_events"
method proposed for the new class (currently IncrementalParser.events)
already returns Elements, having used a TreeBuilder to build them.
XMLParser emits start/end/data calls into the target, but these only carry
tag names, attributes and chunks of data. The hierarchical element
construction is done by TreeBuilder.

What I actually think would be better for the long term is to add new
target invocations in XMLParser - start-ns and end-ns. So XMLParser would
just keep *parsing*, leaving the interpretation of the parsed data to the
target. Today's TreeBuilder is free to ignore these calls. A custom
"EventCollectingTreeBuilder" can collect an event list, having all the
information at its disposal. Thus, XMLParser would remain what it is today
(minus the _setevents hack) - a router for pyexpat events.

These discussions of the future API are interesting, but what's more
important today is to have an API for IncrementalParser (using this name
before a new one is agreed upon) that doesn't block future implementation
changes. And I believe the API proposed here fits the bill.

> > The class will be named EventParser.
>
> Obviously because it's parsing Events, as opposed to the XMLParser, which
> parses XML, or the HTMLParser, which parses HTML, right?
>

The name is not perfect, and proposals for a better one are welcome. FWIW,
since it already lives in the xml.etree namespace, "XML" does not
necessarily have to be part of the name. So, some alternatives:

* EventStreamer - proposed by Nick. I have to admit I don't feel good with
it, because I still want to be crystal clear it's a *parser* we're talking
about.
* EventBasedParser
* EventCollectingParser
* NonblockingParser
* ... other ideas?

History
Date	User	Action	Args
2013-08-26 16:03:48	eli.bendersky	set	recipients: + eli.bendersky, jcea, ncoghlan, scoder, jkloth, flox, python-dev
2013-08-26 16:03:48	eli.bendersky	link	issue17741 messages
2013-08-26 16:03:47	eli.bendersky	create