Message 198537 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	scoder
Recipients	eli.bendersky, jcea, jkloth, scoder
Date	2013-09-28.17:43:10
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1380390191.02.0.747216876195.issue18902@psf.upfronthosting.co.za>
In-reply-to

Content
Copying a relevant comment by Eli from http://bugs.python.org/issue18990#msg198145 and replying inline. """ The way the APIs are currently defined, XMLParser and XMLPullParser are different animals. XMLParser can be considered to only have one "front" in the API - feed() and close(). You feed() until the document is done and then you close() and get the parsed tree. There's no other way to get the parsed tree (unless you use a custom builder, I guess). On the other hand XMLPullParser has two clear "fronts" - an input front with feed() and close() and an output front with read_events(). For XMLPullParser, close() is just an input signal. The canonical way to get output from XMLPullParser is read_events(). close() has no better reason to return output than feed(). When we decided to change the method names (recall that Antoine's originals were completely different), we perhaps forgot this detail. """ No, we didn't. """ Even though XMLPullParser's method is named close(), it's not like XMLParser's close(). If someone is using XMLPullParser for its close() he's likely using the class incorrectly. Just as an example: consider that in a lot of use cases the programmer will want to discard parts of the tree that's parsed iteratively (similarly to the main use case of iterparse()), because the XML itself is too huge. It's a convenient streaming API, in other words. Now, if the reader discards parts of the tree (by deleting subtrees), then returning the root from close() becomes even more meaningless, because it's no longer the root and we have no idea what it actually is. """ Let me repeat that this was already the case before the new class was added and that it's a feature. If the target decides to discard parts of the tree, or not build a tree at all and (say) instead count elements and return their total number on close(), then that's what the user asked for by selecting that target. Let's agree to disagree on your conclusions, but I still can't see any advantages of making the separation between the two classes. The way I see it, making XMLPullParser inherit from XMLParser makes it very easy to explain what the difference is: the read_events() method, i.e. the additional way to receive the parse events that the combination of parser and target generate. Essentially, it's the target that does all the work here and the parser only collects the results and presents them to the user. Thus my intention to keep the parser as "stupid" as it looks from the user's side, instead of adding something new right next to it. That being said, if ElementTree keeps them separate and decides to never return anything from XMLPullParser.close(), then that's sufficiently compatible with lxml.etree, so I won't object to it. lxml has a long history of extending what's there in order to make it easier to use. As long as we can find a way to keep both libraries compatible for users, I think we should be able to both move forward.

Copying a relevant comment by Eli from http://bugs.python.org/issue18990#msg198145 and replying inline.

"""
The way the APIs are currently defined, XMLParser and XMLPullParser are different animals. XMLParser can be considered to only have one "front" in the API - feed() and close(). You feed() until the document is done and then you close() and get the parsed tree. There's no other way to get the parsed tree (unless you use a custom builder, I guess).

On the other hand XMLPullParser has two clear "fronts" - an input front with feed() and close() and an output front with read_events(). For XMLPullParser, close() is just an input signal. The canonical way to get output from XMLPullParser is read_events(). close() has no better reason to return output than feed(). When we decided to change the method names (recall that Antoine's originals were completely different), we perhaps forgot this detail.
"""

No, we didn't.


"""
Even though XMLPullParser's method is named close(), it's *not* like XMLParser's close(). If someone is using XMLPullParser for its close() he's likely using the class incorrectly.

Just as an example: consider that in a lot of use cases the programmer will want to discard parts of the tree that's parsed iteratively (similarly to the main use case of iterparse()), because the XML itself is too huge. It's a convenient streaming API, in other words. Now, if the reader discards parts of the tree (by deleting subtrees), then returning the root from close() becomes even more meaningless, because it's no longer the root and we have no idea what it actually is.
"""

Let me repeat that this was already the case before the new class was added and that it's a feature. If the target decides to discard parts of the tree, or not build a tree at all and (say) instead count elements and return their total number on close(), then that's what the user asked for by selecting that target.

Let's agree to disagree on your conclusions, but I still can't see any advantages of making the separation between the two classes. The way I see it, making XMLPullParser inherit from XMLParser makes it very easy to explain what the difference is: the read_events() method, i.e. the additional way to receive the parse events that the combination of parser and target generate. Essentially, it's the target that does all the work here and the parser only collects the results and presents them to the user. Thus my intention to keep the parser as "stupid" as it looks from the user's side, instead of adding something new right next to it.

That being said, if ElementTree keeps them separate and decides to *never* return anything from XMLPullParser.close(), then that's sufficiently compatible with lxml.etree, so I won't object to it. lxml has a long history of extending what's there in order to make it easier to use.

As long as we can find a way to keep both libraries compatible for users, I think we should be able to both move forward.

History
Date	User	Action	Args
2013-09-28 17:43:11	scoder	set	recipients: + scoder, jcea, jkloth, eli.bendersky
2013-09-28 17:43:11	scoder	set	messageid: <1380390191.02.0.747216876195.issue18902@psf.upfronthosting.co.za>
2013-09-28 17:43:11	scoder	link	issue18902 messages
2013-09-28 17:43:10	scoder	create