Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xml.etree.ElementTree - XMLParser and TreeBuilder's doctype() method missing #58215

Closed
elibendersky mannequin opened this issue Feb 14, 2012 · 25 comments
Closed

xml.etree.ElementTree - XMLParser and TreeBuilder's doctype() method missing #58215

elibendersky mannequin opened this issue Feb 14, 2012 · 25 comments
Labels
release-blocker stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@elibendersky
Copy link
Mannequin

elibendersky mannequin commented Feb 14, 2012

BPO 14007
Nosy @birkenfeld, @tiran, @ezio-melotti, @bitdancer, @florentx, @vadmium

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2012-06-03.03:12:00.188>
created_at = <Date 2012-02-14.04:16:26.395>
labels = ['type-bug', 'library', 'release-blocker']
title = "xml.etree.ElementTree - XMLParser and TreeBuilder's doctype() method missing"
updated_at = <Date 2013-10-28.13:06:10.345>
user = 'https://bugs.python.org/elibendersky'

bugs.python.org fields:

activity = <Date 2013-10-28.13:06:10.345>
actor = 'eli.bendersky'
assignee = 'none'
closed = True
closed_date = <Date 2012-06-03.03:12:00.188>
closer = 'eli.bendersky'
components = ['Library (Lib)']
creation = <Date 2012-02-14.04:16:26.395>
creator = 'eli.bendersky'
dependencies = []
files = []
hgrepos = []
issue_num = 14007
keywords = []
message_count = 25.0
messages = ['153328', '153330', '153332', '153934', '154329', '154333', '154338', '154343', '154398', '154408', '154935', '161873', '161875', '161961', '161987', '162048', '162058', '162059', '162179', '162189', '171604', '201519', '201533', '201538', '201539']
nosy_count = 10.0
nosy_names = ['effbot', 'georg.brandl', 'christian.heimes', 'ezio.melotti', 'Arfrever', 'r.david.murray', 'eli.bendersky', 'flox', 'python-dev', 'martin.panter']
pr_nums = []
priority = 'release blocker'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue14007'
versions = ['Python 3.3']

@elibendersky elibendersky mannequin added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Feb 14, 2012
@elibendersky
Copy link
Mannequin Author

elibendersky mannequin commented Feb 14, 2012

The C accelerator of xml.etree.ElementTree (used by default since bpo-13988) does not use or define or use the doctype() methods of the XMLParser and TreeBuilder classes, although this method is documented.

As far as I can tell, this problem exists in 3.2 as well and wasn't introduced by the changes in bpo-13988.

@elibendersky
Copy link
Mannequin Author

elibendersky mannequin commented Feb 14, 2012

The problem is deeper. _elementtree does not expose XMLParser and TreeBuilder as types at all, just as factory functions.

XMLParser: not sure if it was meant to be subclassed. If not, it should at least be documented. In any case, the XMLParser in _elementtree does not use the doctype() method on its target.

TreeBuilder: was definitely meant to be subclassed (it's documented explicitly), so it must be exposed as a type, not as a factory function.

@florentx
Copy link
Mannequin

florentx mannequin commented Feb 14, 2012

For the doctype() issue, it might be removed, since it was already deprecated in 3.2 (in compliance with PEP-387). However the deprecation cycle is under discussion for the 3.x serie. (bpo-13248)

For subclassing, you hit the same issue for all the functions exposed by _elementtree, including Element fatory.

@birkenfeld
Copy link
Member

It doesn't really matter if something was *meant* to be subclassed. If it can be in 3.x, and can't be in 3.x+1, that's a sort of backwards compatibility bug we want to avoid pretty strongly because it's gratuitous breakage.

@florentx
Copy link
Mannequin

florentx mannequin commented Feb 26, 2012

the class versus factory issue is gone to bpo-14128.

The current issue is only about the doctype() method missing in the C implementation.

I propose to drop this deprecated method from the Python version, instead of implementing the deprecated method in the C version.

@elibendersky
Copy link
Mannequin Author

elibendersky mannequin commented Feb 26, 2012

Florent,

The deprecation should be probably raised separately on pydev. From the recent discussions on this and similar topics, I doubt that removal of these methods will be accepted.

@florentx
Copy link
Mannequin

florentx mannequin commented Feb 26, 2012

I understand the point about compatibility.
However it is slightly different here, because the method is already deprecated in Python 2.7 and 3.2, with a warning in the documentation and a DeprecationWarning at runtime.
This method was never available in the C version, and the documentation is clear about the recommended way of writing a custom doctype handler.
Arguably, it is not the most popular feature of ElementTree.

I am not opposed to adding the deprecated method in the C implementation, but it will need someone to do the patch, taking care of raising the deprecation warning correctly, and taking care of the case where XMLParser is subclassed. Is it worth the hassle?

Please not that contrary to what is stated in the first message (msg153328), the doctype() method is not defined on the default TreeBuilder (Python) class. The documentation suggests to add it on custom TreeBuilder implementations.

@florentx
Copy link
Mannequin

florentx mannequin commented Feb 26, 2012

This last feature (doctype handler on custom TreeBuilder) does not have tests...

So, it is certainly broken with the C implementation.

@florentx
Copy link
Mannequin

florentx mannequin commented Feb 26, 2012

Two other differences:

  • the C TreeBuilder has an undocumented and unused method "xml"
  • if you omit one of the TreeBuilder method (start(), end(), data(), close()) on you custom TreeBuilder implementation, the C XMLParser works fine, but the Python XMLParser raises an attribute error.

And I confirm that if you implement the "doctype()" method on a custom TreeBuilder object, the C XMLParser ignores it, while the Python version works fine.

I propose:

  • to drop the undocumented, empty "TreeBuilder().xml" method.
  • to relax the Python XMLParser, in order to accept incomplete TreeBuilder implementation (like the C version)
  • to implement the "doctype()" handler in the C XMLParser (this needs some work)

@elibendersky
Copy link
Mannequin Author

elibendersky mannequin commented Feb 26, 2012

On Sun, Feb 26, 2012 at 23:49, Florent Xicluna <report@bugs.python.org>wrote:

Yes, these suggestions sound reasonable to me. Moving toward two more
consistent implementations of the API, while not disabling existing
features is the way to go. Things like relaxing the Python XMLParser just
add features rather than restricting them, so they should not result in
breakage due to backward-incompatibility.

Eli

@python-dev
Copy link
Mannequin

python-dev mannequin commented Mar 5, 2012

New changeset 39cc025968f1 by Florent Xicluna in branch 'default':
Issue bpo-14007: drop unused TreeBuilder().xml.
http://hg.python.org/cpython/rev/39cc025968f1

New changeset 47016103185f by Florent Xicluna in branch 'default':
Issue bpo-14007: accept incomplete TreeBuilder objects (missing start/end/data/close) for the Python implementation as well. Add disabled tests for the doctype() method.
http://hg.python.org/cpython/rev/47016103185f

@Arfrever Arfrever mannequin added the release-blocker label Apr 27, 2012
@python-dev
Copy link
Mannequin

python-dev mannequin commented May 29, 2012

New changeset 717632ae7b3f by Eli Bendersky in branch 'default':
Issue bpo-14007: make TreeBuilder an actual type exposed from _elementtree, and subclassable.
http://hg.python.org/cpython/rev/717632ae7b3f

@elibendersky
Copy link
Mannequin Author

elibendersky mannequin commented May 29, 2012

I'm working on completing this issue. It will be done in stages, with each stage committed as a separate chunk of functionality.

  • Make TreeBuilder an actual exported type (subclassable) from _elementtree
  • TreeBuilder constructor should accept arguments as documented
  • Make XMLParser an actual exported type (subclassable) from _elementtree
  • Make XMLParser call doctype on a given target builder, and on itself, as is done in the Python version

The first bullet was implemented and committed --^

@python-dev
Copy link
Mannequin

python-dev mannequin commented May 30, 2012

New changeset 20b8f0ee3d64 by Eli Bendersky in branch 'default':
Issue bpo-14007: implemented the 'element_factory' feature of TreeBuilder in
http://hg.python.org/cpython/rev/20b8f0ee3d64

@elibendersky
Copy link
Mannequin Author

elibendersky mannequin commented May 31, 2012

  • Another problem: the C implementation of XMLParser does not take 3 positional args, but only 2. Although the 'html' arg is documented as "unsupported", it should still be taken and silently ignored, similarly to the Python version

@python-dev
Copy link
Mannequin

python-dev mannequin commented Jun 1, 2012

New changeset a29ae1c2b8b2 by Eli Bendersky in branch 'default':
Issue bpo-14007: make XMLParser a real subclassable type exported from _elementtree. +cleanups
http://hg.python.org/cpython/rev/a29ae1c2b8b2

@python-dev
Copy link
Mannequin

python-dev mannequin commented Jun 1, 2012

New changeset 6f9bfcc1896f by Eli Bendersky in branch 'default':
Issue bpo-14007: implement doctype() method calling in XMLParser of _elementtree.
http://hg.python.org/cpython/rev/6f9bfcc1896f

@elibendersky
Copy link
Mannequin Author

elibendersky mannequin commented Jun 1, 2012

The latest commit completes this issue.

@elibendersky elibendersky mannequin closed this as completed Jun 1, 2012
@bitdancer bitdancer reopened this Jun 2, 2012
@elibendersky
Copy link
Mannequin Author

elibendersky mannequin commented Jun 3, 2012

Thanks. Fixed in changeset eb1d633fe307.

I'll watch the bots to see no problems remain.

@elibendersky elibendersky mannequin closed this as completed Jun 3, 2012
@tiran
Copy link
Member

tiran commented Sep 30, 2012

It seems like your changes have introduced a segfault: bpo-16089

@vadmium
Copy link
Member

vadmium commented Oct 28, 2013

There is an issue in Python 2.7 (and 3.2 if that matters) with the DeprecationWarning for the doctype() method being triggered internally. It is a bit different from this issue but I think a solution has already been committed for 3.3, and the commit message references this issue. Let me know if I should raise a separate report.

The warning is triggered for the Python version of the ElementTree module:

$ python2.7 -Wall
Python 2.7.5 (default, Sep  6 2013, 09:55:21) 
[GCC 4.8.1 20130725 (prerelease)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from xml.etree import ElementTree
>>> ElementTree.XML(b'<!DOCTYPE blaua SYSTEM "id"><elem/>')
/usr/lib/python2.7/xml/etree/ElementTree.py:1627: DeprecationWarning: This method of XMLParser is deprecated.  Define doctype() method on the TreeBuilder target.
  DeprecationWarning,
/usr/lib/python2.7/xml/etree/ElementTree.py:1627: DeprecationWarning: This method of XMLParser is deprecated.  Define doctype() method on the TreeBuilder target.
  DeprecationWarning,
<Element 'elem' at 0xd47910>
>>> 

Possible solution is the patch hunk below, taken from this commit:

http://hg.python.org/cpython/diff/47016103185f/Lib/xml/etree/ElementTree.py#l1.99

@@ -1636,7 +1627,7 @@ class XMLParser:
                     pubid = pubid[1:-1]
                 if hasattr(self.target, "doctype"):
                     self.target.doctype(name, pubid, system[1:-1])
-                elif self.doctype is not self._XMLParser__doctype:
+                elif self.doctype != self._XMLParser__doctype:
                     # warn about deprecated call
                     self._XMLParser__doctype(name, pubid, system[1:-1])
                     self.doctype(name, pubid, system[1:-1])

@elibendersky
Copy link
Mannequin Author

elibendersky mannequin commented Oct 28, 2013

Martin, do you see a way to work around the problem?

I'm not sure it's serious enough to warrant committing to the 2.7.x branch at this point.

@vadmium
Copy link
Member

vadmium commented Oct 28, 2013

The best way to work around it for me is just to ignore the warning. It doesn’t really worry me that much, I only noticed it while porting a program to Python 3 anyway. So if you don’t want to touch the 2.7 branch I can live with that :)

@elibendersky
Copy link
Mannequin Author

elibendersky mannequin commented Oct 28, 2013

On Mon, Oct 28, 2013 at 6:00 AM, Martin Panter <report@bugs.python.org>wrote:

Martin Panter added the comment:

The best way to work around it for me is just to ignore the warning. It
doesn’t really worry me that much, I only noticed it while porting a
program to Python 3 anyway. So if you don’t want to touch the 2.7 branch I
can live with that :)

I'd rather not. 2.7.x is in maintenance mode - we'll fix serious bugs and
may change trivial things. But in this case, since the effects are hardly
serious we should consider the danger of mucking with code in a very stable
maintenance branch. While this issue may seem simple at the surface, it's
not. That's because ET in 3.3+ is quite different from ET in 3.2-
(including 2.7-). In 3.3 ET underwent major changes, and its testing
infrastructure was greatly improved too. Back-porting across the 3.3
boundary is risky because it's hard to foresee the effects (i.e.
interactions between the C accelerator and the Python module).

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-blocker stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

4 participants