classification
Title: ElementTree incorrectly refuses to write attributes without namespaces when default_namespace is used
Type: behavior Stage: patch review
Components: XML Versions: Python 3.2, Python 3.3, Python 3.4, Python 3.5, Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Garrett Birkel, Rafael Ascensao, SpecLad, eli.bendersky, gene_wood, martin.panter, mthuurne, rhettinger, scoder, silverbacknet, wiml
Priority: normal Keywords: patch

Created on 2013-01-31 02:35 by silverbacknet, last changed 2019-10-07 14:34 by mthuurne.

Files
File name Uploaded Description Edit
bug17088_1.patch wiml, 2013-12-05 08:29 Patch and test case review
bug17088_2.patch wiml, 2013-12-13 23:14 Improved patch and test case review
Pull Requests
URL Status Linked Edit
PR 11050 open mthuurne, 2018-12-09 13:34
Messages (14)
msg181005 - (view) Author: Silverback Networks (silverbacknet) Date: 2013-01-31 02:35
ET reads a default-namespaced (xmnls="whatever") file correctly but won't write it back out.

The error given is:
ValueError: cannot use non-qualified names with default_namespace option

The XML reference is reasonably clear on this:
http://www.w3.org/TR/REC-xml-names/#defaulting
"Default namespace declarations do not apply directly to attribute names;"
"The namespace name for an unprefixed attribute name always has no value."

Therefore, it is not an error to write non-qualified _attribute_ names with a default namespace; they're just considered un-namespaced anyway. The trivial case where a file is read in with a default namespace and written out with the same one should make it obvious:

from xml.etree.ElementTree import *
register_namespace('svg', 'http://www.w3.org/2000/svg')
svg = ElementTree(XML("""
<svg width="12cm" height="4cm" viewBox="0 0 1200 400" xmlns="http://www.w3.org/2000/svg" version="1.1">
<rect x="1" y="1" width="1198" height="398" fill="none" stroke="blue" stroke-width="2" />
</svg>
"""))
svg.write('simple_new.svg',encoding='UTF-8',default_namespace='svg')

Yet this will fail with the error above. By leaving off default_namespace, every element is pointlessly prefixed by 'svg:' in the resulting file, but it does work.
msg205281 - (view) Author: Wim (wiml) Date: 2013-12-05 08:29
I have run into this a few times although it is only recently that I've convinced myself I understood the XML namespace spec well enough to know what the right behavior was. (I came to the same interpretation as silverbacknet.)

I have attached a patch which I believe fixes (and tests) the problem.
msg205951 - (view) Author: Wim (wiml) Date: 2013-12-12 10:45
FWIW: I noticed that my patch has a bug due to sharing the cache dict between element names and attribute names, although I think this is unlikely to crop up very often in practice. I'll submit a better patch if/when I get the time to put one together.
msg206156 - (view) Author: Wim (wiml) Date: 2013-12-13 23:14
Here's an improved patch (and improved testcase).

It's a little more intrusive than the last patch because when a default namespace is being used, two distinct qname caches must be made.
msg206186 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2013-12-14 15:06
Note that the option is called "default_namespace", not "default_namespace_prefix". Could you try passing the namespace URI instead?
msg206217 - (view) Author: Wim (wiml) Date: 2013-12-15 07:14
Yes, the problem occurs regardless of whether the default_namespace parameter is the correct SVG namespace URI --- it's the fact of requesting a default namespace at all that exposes the bug.
msg209424 - (view) Author: Wim (wiml) Date: 2014-01-27 09:44
Ping
msg216061 - (view) Author: Gene Wood (gene_wood) Date: 2014-04-14 03:17
One workaround to this is described here : http://stackoverflow.com/a/4999510/168874

It involves prefixing all of the elements with the namespace like this :

    from xml.etree import ElementTree as ET
    
    # build a tree structure
    root = ET.Element("{http://www.company.com}STUFF")
    body = ET.SubElement(root, "{http://www.company.com}MORE_STUFF")
    body.text = "STUFF EVERYWHERE!"
    
    # wrap it in an ElementTree instance, and save as XML
    tree = ET.ElementTree(root)
    
    tree.write("page.xml",
               xml_declaration=True,encoding='utf-8',
               method="xml",default_namespace='http://www.company.com')
msg216067 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2014-04-14 06:00
@gene_wood: that's unrelated. This ticket is about attributes being rejected incorrectly.

Fixing the example of the OP:

>>> from xml.etree.ElementTree import *
>>> svg = ElementTree(XML("""
... <svg width="12cm" height="4cm" viewBox="0 0 1200 400" xmlns="http://www.w3.org/2000/svg" version="1.1">
... <rect x="1" y="1" width="1198" height="398" fill="none" stroke="blue" stroke-width="2" />
... </svg>
... """))
>>> tostring(svg.getroot())   # formatting is mine
b'<svg:svg xmlns:svg="http://www.w3.org/2000/svg" height="4cm" version="1.1" viewBox="0 0 1200 400" width="12cm">\n
      <svg:rect fill="none" height="398" stroke="blue" stroke-width="2" width="1198" x="1" y="1" />\n
  </svg:svg>'
>>> svg.write('simple_new.svg',encoding='UTF-8',default_namespace='http://www.w3.org/2000/svg')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.3/xml/etree/ElementTree.py", line 826, in write
    qnames, namespaces = _namespaces(self._root, default_namespace)
  File "/usr/lib/python3.3/xml/etree/ElementTree.py", line 942, in _namespaces
    add_qname(key)
  File "/usr/lib/python3.3/xml/etree/ElementTree.py", line 920, in add_qname
    "cannot use non-qualified names with "
ValueError: cannot use non-qualified names with default_namespace option
>>> svg.write('simple_new.svg',encoding='UTF-8')
>>> 

So, it works without namespace defaulting and fails with an incorrect error when a default namespace is provided. Clearly a bug.

Regarding the proposed patch: it looks like the right thing to do in general, but it has a relatively high code impact. I would prefer a patch with lower churn. One thing that could be tried is to use only one tag cache dict and extend the key from the plain tag to (tag, is_attribute). Might have a performance impact on the already slow serialiser, though. In any case, both approaches are quite wasteful, because they duplicate the entire namespace-prefix mapping just because there might be a single namespace that behaves differently for atributes. An alternative could be to split the *value* of the mapping in two: (element_prefix, attribute_prefix). This would keep the overhead at serialisation low, with only slightly more work when building the mapping. At first sight, I like that idea better.

This code returns a list in one case and a set-like view in another (Py3):

+    if default_namespace:
+        prefixes_list = [ (default_namespace, "") ]
+        prefixes_list.extend(namespaces.items())
+    else:
+        prefixes_list = namespaces.items()

I can't see the need for this change. Why can't the default namespace be stored in the namespaces dict right from the start, as it was before?

As a minor nitpick, this lambda based sort key:

    key=lambda x: x[1]):  # sort on prefix

is better expressed using operator.itemgetter(1).

I'd also rename the "defaultable" flag to "is_attribute" and pass it as keyword argument (bare boolean parameters are unreadable in function calls).

Given the impact of this change, I'd also suggest not applying it to Py2.x anymore.
msg261785 - (view) Author: Garrett Birkel (Garrett Birkel) Date: 2016-03-14 22:24
Just hit up against this namespace bug. Has this patch been abandoned??
msg304120 - (view) Author: Rafael Ascensao (Rafael Ascensao) Date: 2017-10-11 09:02
what's the status on this?
msg331390 - (view) Author: Maarten ter Huurne (mthuurne) * Date: 2018-12-08 18:38
I was working on what I thought would be an elegant solution to this problem: for non-qualified attributes, add the element's namespace before accessing the cache and strip the namespace prefix after accessing the cache if it's equal to the element's prefix.

However, this approach doesn't work: even though non-qualified attributes will be processed like they are the element's namespace, they are considered to have no namespace. This means <ns:x a="1" ns:a="2"/> is considered valid XML, even though it effectively defines the same attribute twice.

https://www.w3.org/TR/REC-xml-names/#uniqAttrs

In my opinion the spec made a silly choice here, but that's probably not something that can fixed anymore.

I haven't decided yet whether I'll make another attempt at fixing this issue. In any case, I hope this tale of caution benefits someone.
msg331432 - (view) Author: Maarten ter Huurne (mthuurne) * Date: 2018-12-09 14:05
I think I have a good solution now, see the pull request for details. It does touch a lot of code, but I split all the changes into small consistent units, so it should be easier to verify whether they are correct.
msg354101 - (view) Author: Maarten ter Huurne (mthuurne) * Date: 2019-10-07 14:34
Can I please get a review of the pull request?
History
Date User Action Args
2019-10-07 14:34:58mthuurnesetmessages: + msg354101
2018-12-09 14:05:14mthuurnesetmessages: + msg331432
2018-12-09 13:34:07mthuurnesetstage: patch review
pull_requests: + pull_request10286
2018-12-08 20:25:13rhettingersetnosy: + rhettinger
2018-12-08 18:38:17mthuurnesetnosy: + mthuurne
messages: + msg331390
2018-11-23 20:08:36SpecLadsetnosy: + SpecLad
2017-10-11 09:02:24Rafael Ascensaosetnosy: + Rafael Ascensao
messages: + msg304120
2016-03-14 22:24:40Garrett Birkelsetnosy: + Garrett Birkel
messages: + msg261785
2015-01-25 11:16:00martin.pantersetnosy: + martin.panter
2014-04-14 06:00:43scodersetmessages: + msg216067
2014-04-14 03:17:19gene_woodsetnosy: + gene_wood
messages: + msg216061
2014-01-27 09:44:09wimlsetmessages: + msg209424
2013-12-15 07:14:11wimlsetmessages: + msg206217
2013-12-14 15:06:23scodersetmessages: + msg206186
2013-12-14 14:51:37scodersetnosy: + scoder, eli.bendersky
2013-12-13 23:14:49wimlsetfiles: + bug17088_2.patch

messages: + msg206156
2013-12-12 10:45:35wimlsetmessages: + msg205951
2013-12-05 08:29:34wimlsetfiles: + bug17088_1.patch

nosy: + wiml
messages: + msg205281

keywords: + patch
2013-01-31 02:35:07silverbacknetcreate