classification
Title: ElementTree incorrectly refuses to write attributes without namespaces when default_namespace is used
Type: behavior Stage:
Components: XML Versions: Python 3.5, Python 3.2, Python 3.3, Python 3.4, Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Garrett Birkel, Rafael Ascensao, eli.bendersky, gene_wood, martin.panter, scoder, silverbacknet, wiml
Priority: normal Keywords: patch

Created on 2013-01-31 02:35 by silverbacknet, last changed 2017-10-11 09:02 by Rafael Ascensao.

Files
File name Uploaded Description Edit
bug17088_1.patch wiml, 2013-12-05 08:29 Patch and test case review
bug17088_2.patch wiml, 2013-12-13 23:14 Improved patch and test case review
Messages (11)
msg181005 - (view) Author: Silverback Networks (silverbacknet) Date: 2013-01-31 02:35
ET reads a default-namespaced (xmnls="whatever") file correctly but won't write it back out.

The error given is:
ValueError: cannot use non-qualified names with default_namespace option

The XML reference is reasonably clear on this:
http://www.w3.org/TR/REC-xml-names/#defaulting
"Default namespace declarations do not apply directly to attribute names;"
"The namespace name for an unprefixed attribute name always has no value."

Therefore, it is not an error to write non-qualified _attribute_ names with a default namespace; they're just considered un-namespaced anyway. The trivial case where a file is read in with a default namespace and written out with the same one should make it obvious:

from xml.etree.ElementTree import *
register_namespace('svg', 'http://www.w3.org/2000/svg')
svg = ElementTree(XML("""
<svg width="12cm" height="4cm" viewBox="0 0 1200 400" xmlns="http://www.w3.org/2000/svg" version="1.1">
<rect x="1" y="1" width="1198" height="398" fill="none" stroke="blue" stroke-width="2" />
</svg>
"""))
svg.write('simple_new.svg',encoding='UTF-8',default_namespace='svg')

Yet this will fail with the error above. By leaving off default_namespace, every element is pointlessly prefixed by 'svg:' in the resulting file, but it does work.
msg205281 - (view) Author: Wim (wiml) Date: 2013-12-05 08:29
I have run into this a few times although it is only recently that I've convinced myself I understood the XML namespace spec well enough to know what the right behavior was. (I came to the same interpretation as silverbacknet.)

I have attached a patch which I believe fixes (and tests) the problem.
msg205951 - (view) Author: Wim (wiml) Date: 2013-12-12 10:45
FWIW: I noticed that my patch has a bug due to sharing the cache dict between element names and attribute names, although I think this is unlikely to crop up very often in practice. I'll submit a better patch if/when I get the time to put one together.
msg206156 - (view) Author: Wim (wiml) Date: 2013-12-13 23:14
Here's an improved patch (and improved testcase).

It's a little more intrusive than the last patch because when a default namespace is being used, two distinct qname caches must be made.
msg206186 - (view) Author: Stefan Behnel (scoder) * Date: 2013-12-14 15:06
Note that the option is called "default_namespace", not "default_namespace_prefix". Could you try passing the namespace URI instead?
msg206217 - (view) Author: Wim (wiml) Date: 2013-12-15 07:14
Yes, the problem occurs regardless of whether the default_namespace parameter is the correct SVG namespace URI --- it's the fact of requesting a default namespace at all that exposes the bug.
msg209424 - (view) Author: Wim (wiml) Date: 2014-01-27 09:44
Ping
msg216061 - (view) Author: Gene Wood (gene_wood) Date: 2014-04-14 03:17
One workaround to this is described here : http://stackoverflow.com/a/4999510/168874

It involves prefixing all of the elements with the namespace like this :

    from xml.etree import ElementTree as ET
    
    # build a tree structure
    root = ET.Element("{http://www.company.com}STUFF")
    body = ET.SubElement(root, "{http://www.company.com}MORE_STUFF")
    body.text = "STUFF EVERYWHERE!"
    
    # wrap it in an ElementTree instance, and save as XML
    tree = ET.ElementTree(root)
    
    tree.write("page.xml",
               xml_declaration=True,encoding='utf-8',
               method="xml",default_namespace='http://www.company.com')
msg216067 - (view) Author: Stefan Behnel (scoder) * Date: 2014-04-14 06:00
@gene_wood: that's unrelated. This ticket is about attributes being rejected incorrectly.

Fixing the example of the OP:

>>> from xml.etree.ElementTree import *
>>> svg = ElementTree(XML("""
... <svg width="12cm" height="4cm" viewBox="0 0 1200 400" xmlns="http://www.w3.org/2000/svg" version="1.1">
... <rect x="1" y="1" width="1198" height="398" fill="none" stroke="blue" stroke-width="2" />
... </svg>
... """))
>>> tostring(svg.getroot())   # formatting is mine
b'<svg:svg xmlns:svg="http://www.w3.org/2000/svg" height="4cm" version="1.1" viewBox="0 0 1200 400" width="12cm">\n
      <svg:rect fill="none" height="398" stroke="blue" stroke-width="2" width="1198" x="1" y="1" />\n
  </svg:svg>'
>>> svg.write('simple_new.svg',encoding='UTF-8',default_namespace='http://www.w3.org/2000/svg')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.3/xml/etree/ElementTree.py", line 826, in write
    qnames, namespaces = _namespaces(self._root, default_namespace)
  File "/usr/lib/python3.3/xml/etree/ElementTree.py", line 942, in _namespaces
    add_qname(key)
  File "/usr/lib/python3.3/xml/etree/ElementTree.py", line 920, in add_qname
    "cannot use non-qualified names with "
ValueError: cannot use non-qualified names with default_namespace option
>>> svg.write('simple_new.svg',encoding='UTF-8')
>>> 

So, it works without namespace defaulting and fails with an incorrect error when a default namespace is provided. Clearly a bug.

Regarding the proposed patch: it looks like the right thing to do in general, but it has a relatively high code impact. I would prefer a patch with lower churn. One thing that could be tried is to use only one tag cache dict and extend the key from the plain tag to (tag, is_attribute). Might have a performance impact on the already slow serialiser, though. In any case, both approaches are quite wasteful, because they duplicate the entire namespace-prefix mapping just because there might be a single namespace that behaves differently for atributes. An alternative could be to split the *value* of the mapping in two: (element_prefix, attribute_prefix). This would keep the overhead at serialisation low, with only slightly more work when building the mapping. At first sight, I like that idea better.

This code returns a list in one case and a set-like view in another (Py3):

+    if default_namespace:
+        prefixes_list = [ (default_namespace, "") ]
+        prefixes_list.extend(namespaces.items())
+    else:
+        prefixes_list = namespaces.items()

I can't see the need for this change. Why can't the default namespace be stored in the namespaces dict right from the start, as it was before?

As a minor nitpick, this lambda based sort key:

    key=lambda x: x[1]):  # sort on prefix

is better expressed using operator.itemgetter(1).

I'd also rename the "defaultable" flag to "is_attribute" and pass it as keyword argument (bare boolean parameters are unreadable in function calls).

Given the impact of this change, I'd also suggest not applying it to Py2.x anymore.
msg261785 - (view) Author: Garrett Birkel (Garrett Birkel) Date: 2016-03-14 22:24
Just hit up against this namespace bug. Has this patch been abandoned??
msg304120 - (view) Author: Rafael Ascensao (Rafael Ascensao) Date: 2017-10-11 09:02
what's the status on this?
History
Date User Action Args
2017-10-11 09:02:24Rafael Ascensaosetnosy: + Rafael Ascensao
messages: + msg304120
2016-03-14 22:24:40Garrett Birkelsetnosy: + Garrett Birkel
messages: + msg261785
2015-01-25 11:16:00martin.pantersetnosy: + martin.panter
2014-04-14 06:00:43scodersetmessages: + msg216067
2014-04-14 03:17:19gene_woodsetnosy: + gene_wood
messages: + msg216061
2014-01-27 09:44:09wimlsetmessages: + msg209424
2013-12-15 07:14:11wimlsetmessages: + msg206217
2013-12-14 15:06:23scodersetmessages: + msg206186
2013-12-14 14:51:37scodersetnosy: + scoder, eli.bendersky
2013-12-13 23:14:49wimlsetfiles: + bug17088_2.patch

messages: + msg206156
2013-12-12 10:45:35wimlsetmessages: + msg205951
2013-12-05 08:29:34wimlsetfiles: + bug17088_1.patch

nosy: + wiml
messages: + msg205281

keywords: + patch
2013-01-31 02:35:07silverbacknetcreate