This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: In xml.etree.ElementTree Element can be created with empty and None tag
Type: behavior Stage: resolved
Components: Library (Lib), XML Versions: Python 3.7
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: eli.bendersky, gphemsley, py.user, rhettinger, scoder, serhiy.storchaka
Priority: normal Keywords:

Created on 2016-09-21 11:52 by py.user, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (10)
msg277125 - (view) Author: py.user (py.user) * Date: 2016-09-21 11:52
It is possible to create and serialize an Element instance with empty string tag value:

>>> import xml.etree.ElementTree as etree
>>>
>>> root = etree.Element('')
>>> elem = etree.SubElement(root, '')
>>>
>>> root
<Element '' at 0xb744e34c>
>>> elem
<Element '' at 0xb744e374>
>>>
>>> etree.tostring(root)
b'<>< /></>'
>>> etree.dump(root)
<>< /></>
>>>


It is possible to create and serialize an Element instance with None tag value:

>>> import xml.etree.ElementTree as etree
>>>
>>> root = etree.Element(None)
>>> elem = etree.SubElement(root, None)
>>>
>>> root
<Element None at 0xb7468c34>
>>> root[0]
<Element None at 0xb746334c>
>>> len(root)
1
>>> etree.tostring(root)
b''
>>> etree.dump(root)

>>>


And same try with site package lxml raises an exception both for empty string and for None:

>>> import lxml.etree
>>>
>>> lxml.etree.Element('')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "lxml.etree.pyx", line 2809, in lxml.etree.Element (src/lxml/lxml.etree.c:61393)
  File "apihelpers.pxi", line 87, in lxml.etree._makeElement (src/lxml/lxml.etree.c:13390)
  File "apihelpers.pxi", line 1446, in lxml.etree._getNsTag (src/lxml/lxml.etree.c:25978)
  File "apihelpers.pxi", line 1481, in lxml.etree.__getNsTag (src/lxml/lxml.etree.c:26304)
ValueError: Empty tag name
>>>
>>> lxml.etree.Element(None)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "lxml.etree.pyx", line 2809, in lxml.etree.Element (src/lxml/lxml.etree.c:61393)
  File "apihelpers.pxi", line 87, in lxml.etree._makeElement (src/lxml/lxml.etree.c:13390)
  File "apihelpers.pxi", line 1446, in lxml.etree._getNsTag (src/lxml/lxml.etree.c:25978)
  File "apihelpers.pxi", line 1464, in lxml.etree.__getNsTag (src/lxml/lxml.etree.c:26114)
  File "apihelpers.pxi", line 1342, in lxml.etree._utf8 (src/lxml/lxml.etree.c:24770)
TypeError: Argument must be bytes or unicode, got 'NoneType'
>>>
msg308910 - (view) Author: Gordon P. Hemsley (gphemsley) * Date: 2017-12-21 22:22
I decided to take a look at this, since it seems easy...

At first glance, this would appear to be a straightforward change--the docs state in multiple places that Element() takes a string as its tag argument.

But it turns out that a lot of internal functionality depends on passing in non-strings as the tag value.
msg308925 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2017-12-22 06:38
I don't think this is worth fixing.  The package is under no obligation to make early type checks for arguments.  It is typical in the Python world to let those kinds of input errors surface downstream when they are used.  In contrast, C code typically does the checks when the arguments are passed in.
msg308975 - (view) Author: Gordon P. Hemsley (gphemsley) * Date: 2017-12-24 00:03
I disagree. This library is meant to be an interface onto XML syntax, and XML has pretty strict requirements on syntax. As msg277125 shows, you're liable to get very far downstream before the error becomes apparent.

In addition, I'm finding a number of internal inconsistencies, both between the docs and the code and between the Python code and the C code, that demonstrate that doing these type checks up front would be beneficial to the entire library. (Note: The C code also does not do them.)
msg308988 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-12-24 08:27
I concur with Raymond. Supporting non-string tags is a feature of ElementTree that is used internally (for comments, etc) and can be used in user code. And the C implementation intentionally reproduces this feature.
msg308992 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2017-12-24 09:59
I also consider it an actual feature of ElementTree to allow arbitrary objects as its tags, even if it's not one of the most prominent. lxml cannot copy this because it is based on C libraries internally, but that shouldn't prevent ET from allowing it.

The fact that None tags disappear is also definitely a feature. It's an easy way to delete tags from trees without requiring any restructuring.

OTOH, whether an empty string should be serialised in the way the OP shows is not so clear. The output is not XML. I can't see any use case for this, but it feels like a potential source of bugs. I think it would be better to have the serialiser explicitly reject this than letting it silently generate broken output.

Not something to change in Py3.6, though.
msg309009 - (view) Author: Gordon P. Hemsley (gphemsley) * Date: 2017-12-24 18:36
To be clear, we are talking about the Element class of the ElementTree module, which is distinct from the ElementTree class of the same module.

That said, I personally question the implementation decision to represent things like treating comments as an Element with a tag of a Comment function. The XML standard is pretty clear that neither comments nor processing instructions are in fact elements, and I don't see it as a Good Thing that arbitrary objects are allowed as the value of tag, unless there is a requirement that such objects are subclasses of str.

Note also that the documentation makes no mention of tag being anything other than a string. And there is inconsistency with where bytes are supposedly allowed (according to the documentation) and where they're actually allowed (according to the code). Given this, I think it's hard to say what user code is expected to make use of.
msg309014 - (view) Author: Gordon P. Hemsley (gphemsley) * Date: 2017-12-24 19:30
Issues of potential relevance to this discussion:
* Issue28237 - In xml.etree.ElementTree bytes tag or attributes raises on serialization
* Issue5166 - ElementTree and minidom don't prevent creation of not well-formed XML
msg309018 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2017-12-24 20:58
>That said, I personally question the implementation decision to
>represent things like treating comments as an Element with a tag of a
>Comment function. 

This is not going to change.
msg309019 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2017-12-24 22:18
Marking this as closed.

Gordon, thank you for showing an interest in this tracker item.  While fixing bugs is of interest, altering long standing intentional design decisions is not useful.  The time to do that is before a module is released, not a decade later.
History
Date User Action Args
2022-04-11 14:58:37adminsetgithub: 72423
2017-12-24 22:18:52rhettingersetstatus: open -> closed
resolution: not a bug
messages: + msg309019

stage: resolved
2017-12-24 20:58:51scodersetmessages: + msg309018
2017-12-24 19:30:16gphemsleysetmessages: + msg309014
2017-12-24 18:36:59gphemsleysetmessages: + msg309009
2017-12-24 09:59:52scodersetmessages: + msg308992
versions: + Python 3.7, - Python 3.6
2017-12-24 08:27:31serhiy.storchakasetnosy: + serhiy.storchaka, eli.bendersky, scoder
messages: + msg308988
2017-12-24 00:03:07gphemsleysetmessages: + msg308975
2017-12-22 06:38:30rhettingersetnosy: + rhettinger
messages: + msg308925
2017-12-21 22:22:19gphemsleysetnosy: + gphemsley
messages: + msg308910
2016-09-21 11:52:15py.usercreate