classification
Title: xml.etree.ElementTree - XMLParser and TreeBuilder's doctype() method missing
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Arfrever, christian.heimes, effbot, eli.bendersky, ezio.melotti, flox, georg.brandl, martin.panter, python-dev, r.david.murray
Priority: release blocker Keywords:

Created on 2012-02-14 04:16 by eli.bendersky, last changed 2013-10-28 13:06 by eli.bendersky. This issue is now closed.

Messages (25)
msg153328 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2012-02-14 04:19
The C accelerator of xml.etree.ElementTree (used by default since issue 13988) does not use or define or use the doctype() methods of the XMLParser and TreeBuilder classes, although this method is documented.

As far as I can tell, this problem exists in 3.2 as well and wasn't introduced by the changes in issue 13988.
msg153330 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2012-02-14 04:51
The problem is deeper. _elementtree does not expose XMLParser and TreeBuilder as types at all, just as factory functions.

XMLParser: not sure if it was meant to be subclassed. If not, it should at least be documented. In any case, the XMLParser in _elementtree does not use the doctype() method on its target.

TreeBuilder: was definitely meant to be subclassed (it's documented explicitly), so it must be exposed as a type, not as a factory function.
msg153332 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2012-02-14 07:33
For the doctype() issue, it might be removed, since it was already deprecated in 3.2 (in compliance with PEP 387). However the deprecation cycle is under discussion for the 3.x serie. (issue 13248)

For subclassing, you hit the same issue for all the functions exposed by _elementtree, including Element fatory.
msg153934 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2012-02-22 08:28
It doesn't really matter if something was *meant* to be subclassed.  If it can be in 3.x, and can't be in 3.x+1, that's a sort of backwards compatibility bug we want to avoid pretty strongly because it's gratuitous breakage.
msg154329 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2012-02-26 11:55
the class versus factory issue is gone to issue 14128.

The current issue is only about the doctype() method missing in the C implementation.

I propose to drop this deprecated method from the Python version, instead of implementing the deprecated method in the C version.
msg154333 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2012-02-26 12:02
Florent,

The deprecation should be probably raised separately on pydev. From the recent discussions on this and similar topics, I doubt that removal of these methods will be accepted.
msg154338 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2012-02-26 12:29
I understand the point about compatibility.
However it is slightly different here, because the method is already deprecated in Python 2.7 and 3.2, with a warning in the documentation and a DeprecationWarning at runtime.
This method was never available in the C version, and the documentation is clear about the recommended way of writing a custom doctype handler.
Arguably, it is not the most popular feature of ElementTree.

I am not opposed to adding the deprecated method in the C implementation, but it will need someone to do the patch, taking care of raising the deprecation warning correctly, and taking care of the case where XMLParser is subclassed. Is it worth the hassle?


Please not that contrary to what is stated in the first message (msg153328), the doctype() method is not defined on the default TreeBuilder (Python) class. The documentation suggests to add it on custom TreeBuilder implementations.
msg154343 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2012-02-26 13:13
This last feature (doctype handler on custom TreeBuilder) does not have tests...

So, it is certainly broken with the C implementation.
msg154398 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2012-02-26 21:49
Two other differences:
 * the C TreeBuilder has an undocumented and unused method "xml"
 * if you omit one of the TreeBuilder method (start(), end(), data(), close()) on you custom TreeBuilder implementation, the C XMLParser works fine, but the Python XMLParser raises an attribute error.

And I confirm that if you implement the "doctype()" method on a custom TreeBuilder object, the C XMLParser ignores it, while the Python version works fine.


I propose:
 - to drop the undocumented, empty "TreeBuilder().xml" method.
 - to relax the Python XMLParser, in order to accept incomplete TreeBuilder implementation (like the C version)
 - to implement the "doctype()" handler in the C XMLParser (this needs some work)
msg154408 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2012-02-26 22:18
On Sun, Feb 26, 2012 at 23:49, Florent Xicluna <report@bugs.python.org>wrote:

Yes, these suggestions sound reasonable to me. Moving toward two more
consistent implementations of the API, while not disabling existing
features is the way to go. Things like relaxing the Python XMLParser just
add features rather than restricting them, so they should not result in
breakage due to backward-incompatibility.

Eli
msg154935 - (view) Author: Roundup Robot (python-dev) Date: 2012-03-05 09:43
New changeset 39cc025968f1 by Florent Xicluna in branch 'default':
Issue #14007: drop unused TreeBuilder().xml.
http://hg.python.org/cpython/rev/39cc025968f1

New changeset 47016103185f by Florent Xicluna in branch 'default':
Issue #14007: accept incomplete TreeBuilder objects (missing start/end/data/close) for the Python implementation as well. Add disabled tests for the doctype() method.
http://hg.python.org/cpython/rev/47016103185f
msg161873 - (view) Author: Roundup Robot (python-dev) Date: 2012-05-29 12:46
New changeset 717632ae7b3f by Eli Bendersky in branch 'default':
Issue #14007: make TreeBuilder an actual type exposed from _elementtree, and subclassable.
http://hg.python.org/cpython/rev/717632ae7b3f
msg161875 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2012-05-29 13:07
I'm working on completing this issue. It will be done in stages, with each stage committed as a separate chunk of functionality.

* Make TreeBuilder an actual exported type (subclassable) from _elementtree
* TreeBuilder constructor should accept arguments as documented
* Make XMLParser an actual exported type (subclassable) from _elementtree
* Make XMLParser call doctype on a given target builder, and on itself, as is done in the Python version

The first bullet was implemented and committed --^
msg161961 - (view) Author: Roundup Robot (python-dev) Date: 2012-05-30 14:59
New changeset 20b8f0ee3d64 by Eli Bendersky in branch 'default':
Issue #14007: implemented the 'element_factory' feature of TreeBuilder in
http://hg.python.org/cpython/rev/20b8f0ee3d64
msg161987 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2012-05-31 09:38
* Another problem: the C implementation of XMLParser does not take 3 positional args, but only 2. Although the 'html' arg is documented as "unsupported", it should still be taken and silently ignored, similarly to the Python version
msg162048 - (view) Author: Roundup Robot (python-dev) Date: 2012-06-01 04:16
New changeset a29ae1c2b8b2 by Eli Bendersky in branch 'default':
Issue #14007: make XMLParser a real subclassable type exported from _elementtree. +cleanups
http://hg.python.org/cpython/rev/a29ae1c2b8b2
msg162058 - (view) Author: Roundup Robot (python-dev) Date: 2012-06-01 08:34
New changeset 6f9bfcc1896f by Eli Bendersky in branch 'default':
Issue #14007: implement doctype() method calling in XMLParser of _elementtree.
http://hg.python.org/cpython/rev/6f9bfcc1896f
msg162059 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2012-06-01 08:36
The latest commit completes this issue.
msg162179 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2012-06-02 21:31
This is causing buildbot failures on some of the buildbots:

http://www.python.org/dev/buildbot/all/builders/x86%20Gentoo%203.x/builds/2529/steps/test/logs/stdio

http://www.python.org/dev/buildbot/all/builders/x86%20Windows7%203.x/builds/5082/steps/test/logs/stdio
msg162189 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2012-06-03 03:11
Thanks. Fixed in changeset eb1d633fe307.

I'll watch the bots to see no problems remain.
msg171604 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2012-09-30 02:53
It seems like your changes have introduced a segfault: #16089
msg201519 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2013-10-28 09:18
There is an issue in Python 2.7 (and 3.2 if that matters) with the DeprecationWarning for the doctype() method being triggered internally. It is a bit different from this issue but I think a solution has already been committed for 3.3, and the commit message references this issue. Let me know if I should raise a separate report.

The warning is triggered for the Python version of the ElementTree module:

$ python2.7 -Wall
Python 2.7.5 (default, Sep  6 2013, 09:55:21) 
[GCC 4.8.1 20130725 (prerelease)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from xml.etree import ElementTree
>>> ElementTree.XML(b'<!DOCTYPE blaua SYSTEM "id"><elem/>')
/usr/lib/python2.7/xml/etree/ElementTree.py:1627: DeprecationWarning: This method of XMLParser is deprecated.  Define doctype() method on the TreeBuilder target.
  DeprecationWarning,
/usr/lib/python2.7/xml/etree/ElementTree.py:1627: DeprecationWarning: This method of XMLParser is deprecated.  Define doctype() method on the TreeBuilder target.
  DeprecationWarning,
<Element 'elem' at 0xd47910>
>>> 

Possible solution is the patch hunk below, taken from this commit:

http://hg.python.org/cpython/diff/47016103185f/Lib/xml/etree/ElementTree.py#l1.99

@@ -1636,7 +1627,7 @@ class XMLParser:
                     pubid = pubid[1:-1]
                 if hasattr(self.target, "doctype"):
                     self.target.doctype(name, pubid, system[1:-1])
-                elif self.doctype is not self._XMLParser__doctype:
+                elif self.doctype != self._XMLParser__doctype:
                     # warn about deprecated call
                     self._XMLParser__doctype(name, pubid, system[1:-1])
                     self.doctype(name, pubid, system[1:-1])
msg201533 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2013-10-28 12:53
Martin, do you see a way to work around the problem?

I'm not sure it's serious enough to warrant committing to the 2.7.x branch at this point.
msg201538 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2013-10-28 13:00
The best way to work around it for me is just to ignore the warning. It doesn’t really worry me that much, I only noticed it while porting a program to Python 3 anyway. So if you don’t want to touch the 2.7 branch I can live with that :)
msg201539 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2013-10-28 13:06
On Mon, Oct 28, 2013 at 6:00 AM, Martin Panter <report@bugs.python.org>wrote:

>
> Martin Panter added the comment:
>
> The best way to work around it for me is just to ignore the warning. It
> doesn’t really worry me that much, I only noticed it while porting a
> program to Python 3 anyway. So if you don’t want to touch the 2.7 branch I
> can live with that :)
>

I'd rather not. 2.7.x is in maintenance mode - we'll fix serious bugs and
may change trivial things. But in this case, since the effects are hardly
serious we should consider the danger of mucking with code in a very stable
maintenance branch. While this issue may seem simple at the surface, it's
not. That's because ET in 3.3+ is quite different from ET in 3.2-
(including 2.7-). In 3.3 ET underwent major changes, and its testing
infrastructure was greatly improved too. Back-porting across the 3.3
boundary is risky because it's hard to foresee the effects (i.e.
interactions between the C accelerator and the Python module).
History
Date User Action Args
2013-10-28 13:06:10eli.benderskysetmessages: + msg201539
2013-10-28 13:00:00martin.pantersetmessages: + msg201538
2013-10-28 12:53:47eli.benderskysetmessages: + msg201533
2013-10-28 09:18:43martin.pantersetnosy: + martin.panter
messages: + msg201519
2012-09-30 02:53:28christian.heimessetnosy: + christian.heimes
messages: + msg171604
2012-06-03 03:12:00eli.benderskysetstatus: open -> closed

messages: + msg162189
2012-06-02 21:31:51r.david.murraysetstatus: closed -> open
nosy: + r.david.murray
messages: + msg162179

2012-06-01 08:36:56eli.benderskysetstatus: open -> closed
resolution: fixed
messages: + msg162059

stage: test needed -> resolved
2012-06-01 08:34:31python-devsetmessages: + msg162058
2012-06-01 04:16:50python-devsetmessages: + msg162048
2012-05-31 09:38:09eli.benderskysetmessages: + msg161987
2012-05-30 14:59:11python-devsetmessages: + msg161961
2012-05-29 13:07:57eli.benderskysetmessages: + msg161875
2012-05-29 12:46:53python-devsetmessages: + msg161873
2012-04-27 21:59:38Arfreversetpriority: normal -> release blocker
2012-03-05 09:43:03python-devsetnosy: + python-dev
messages: + msg154935
2012-02-26 22:18:46eli.benderskysetmessages: + msg154408
2012-02-26 21:49:37floxsetmessages: + msg154398
2012-02-26 13:13:33floxsetmessages: + msg154343
stage: needs patch -> test needed
2012-02-26 12:29:31floxsetpriority: high -> normal

messages: + msg154338
2012-02-26 12:02:39eli.benderskysetmessages: + msg154333
2012-02-26 11:55:24floxsetmessages: + msg154329
2012-02-22 08:28:19georg.brandlsetnosy: + georg.brandl
messages: + msg153934
2012-02-14 10:49:41Arfreversetnosy: + Arfrever
2012-02-14 07:33:49floxsetmessages: + msg153332
2012-02-14 04:51:46eli.benderskysetmessages: + msg153330
2012-02-14 04:19:50eli.benderskysetmessages: + msg153328
2012-02-14 04:19:29eli.benderskyset -> (no value)
messages: - msg153324
2012-02-14 04:16:26eli.benderskycreate