classification
Title: ET: add custom namespaces to serialization methods
Type: enhancement Stage: needs patch
Components: XML Versions: Python 3.5
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Nekmo, effbot, eli.bendersky, flox, jcea, r.david.murray, scoder
Priority: normal Keywords: easy, patch

Created on 2011-11-09 22:17 by Nekmo, last changed 2014-10-04 01:07 by r.david.murray.

Files
File name Uploaded Description Edit
issue13378_non_global_namespaces.diff flox, 2011-11-11 00:23 review
issue13378_non_global_namespaces_v2.diff flox, 2011-11-16 00:46 review
issue13378_non_global_namespaces_v3.diff flox, 2011-12-09 22:11 review
Messages (18)
msg147378 - (view) Author: Nekmo (Nekmo) Date: 2011-11-09 22:17
Currently, the mapping of namespaces is global and can cause failures if multiple instances are used or in multithreading. The variable is in xml.etree.ElementTree._namespace_map. I ask it to be switched to xml.etree._Element instance.
msg147379 - (view) Author: Jesús Cea Avión (jcea) * (Python committer) Date: 2011-11-09 22:28
Tagging this as targeting 3.3.

Nekmo, could you possibly poste some code showing the problem?
msg147380 - (view) Author: Nekmo (Nekmo) Date: 2011-11-09 23:22
In my case, I have several clients, and they define the namespaces. I am interested in return the same namespace that they gave me, for example, the client "A" gives me this:

<house:iq xmlns:house="http://localhost/house" />

To name the namespace, I set it at nsmap:

>>> import xml.etree.ElementTree as etree
>>> etree.register_namespace('house', 'http://localhost/house')
>>> etree._namespace_map
{'http://localhost/house': 'house',
 'http://purl.org/dc/elements/1.1/': 'dc',
 'http://schemas.xmlsoap.org/wsdl/': 'wsdl',
 'http://www.w3.org/1999/02/22-rdf-syntax-ns#': 'rdf',
 'http://www.w3.org/1999/xhtml': 'html',
 'http://www.w3.org/2001/XMLSchema': 'xs',
 'http://www.w3.org/2001/XMLSchema-instance': 'xsi',
 'http://www.w3.org/XML/1998/namespace': 'xml'}

Thus, keeping the name of the namespace:
>>> etree.tostring(etree.Element('{http://localhost/house}iq'))
b'<house:iq xmlns:house="http://localhost/house" />'

But if I have a client "B", which uses a different name, and run in parallel, problems can occur:

<home:iq xmlns:home="http://localhost/house" />

>>> import xml.etree.ElementTree as etree
>>> etree.register_namespace('home', 'http://localhost/house')
>>> etree._namespace_map
{'http://localhost/house': 'home',
 'http://purl.org/dc/elements/1.1/': 'dc',
 'http://schemas.xmlsoap.org/wsdl/': 'wsdl',
 'http://www.w3.org/1999/02/22-rdf-syntax-ns#': 'rdf',
 'http://www.w3.org/1999/xhtml': 'html',
 'http://www.w3.org/2001/XMLSchema': 'xs',
 'http://www.w3.org/2001/XMLSchema-instance': 'xsi',
 'http://www.w3.org/XML/1998/namespace': 'xml'}

Therefore, I ask that _namespace_map is within etree._Element instance, and not global
msg147415 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2011-11-11 00:23
This patch proposes an implementation of the feature.


>>> from xml.etree import ElementTree as ET
>>> ET.tostring(ET.Element('{http://localhost/house}iq'), encoding="unicode", namespaces={'http://localhost/house': 'home'})
'<home:iq xmlns:home="http://localhost/house" />'
msg147419 - (view) Author: Stefan Behnel (scoder) * Date: 2011-11-11 07:06
Florent, thanks for the notification.

Nekmo, note that you are misusing this feature. The _namespace_map is meant to provide "well known namespace prefixes" only, so that common namespaces end up using the "expected" prefix. This is also the reason why it maps namespaces to prefixes and not the other way round. It is not meant to temporarily assign arbitrary prefix to namespaces. That is the reason for it being a global option.

That being said, lxml.etree's Element factory takes an "nsmap" parameter that implements the feature you want. It's documented here:

http://lxml.de/tutorial.html#namespaces

Note that it maps prefixes to namespaces and not the other way round. This is because there is a corresponding "nsmap" property on Elements that provides the currently defined prefixes in the context of an Element. ElementTree itself does not (and cannot) support this property because it drops the prefixes during parsing. However, I would still request that an implementation of the parameter to the Element() factory should be compatible for both libraries.

Also look for "nsmap" in the compatibility docs (appears in two sections):

http://lxml.de/compatibility.html
msg147422 - (view) Author: Stefan Behnel (scoder) * Date: 2011-11-11 08:38
Reading the proposed patch, I must agree that it makes more sense in ElementTree to support this as a serialiser feature. ET's tree model doesn't have a notion of prefixes, whereas it's native to lxml.etree.

Two major advantages of putting this into the serialiser are: 1) cET doesn't have to be modified, and 2) it does not require additional memory to store the nsmap reference on each Element. The latter by itself is a very valuable property, given that cET aims specifically at a low memory overhead.

I see a couple of drawbacks:

1) it only supports the case that namespaces are globally defined. The implementation cannot handle the case that local namespaces should only be defined in subtrees, or that prefixes are being reused. This is no real restriction because globally defined namespaces are usually just fine. It's more of an inconvenience in some cases, such as multi-namespace languages like SOAP or WSDL+XSD, where namespaces are commonly declared on the subtree where they start being used.

2) lxml.etree cannot support this because it keeps the prefixes in the tree nodes and uses them on serialisation. This cannot easily be overridden because the serialiser is part of libxml2.

I didn't see in the patch how (or if?) the prefix redefinition case is handled. Given that prefixes are always defined globally, it would be nice if this only resulted in an error if two namespaces that are really used in the document map to the same prefix, not always when the namespace dict is redundant by itself.

Also note that it's good to be explicit about the keyword arguments that a function accepts. It aids when help(tostring) tells you directly what you can pass in, instead of just printing "**kw".
msg147743 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2011-11-16 00:46
Thank you Stefan for the comments.
I've added the prefix collision detection, and removed the **kw argument.
(+ tests)
msg149133 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2011-12-09 22:11
Updated with documentation.
Thank you for the review.

I know this does not cover different namespaces in subtree.
But this use case seems very specific. The user could find other means to achieve it.
msg149143 - (view) Author: Stefan Behnel (scoder) * Date: 2011-12-10 07:04
Given that this is a major new feature for the serialiser in ElementTree, I think it's worth asking Fredrik for any comments.
msg149187 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2011-12-10 20:47
Of course it's better to have someone else to review the patch.
However in this case, I'm not sure it is a major feature.

BTW, I noticed that effbot is currently marked as *inactive* maintainer
http://docs.python.org/devguide/experts.html#stdlib

If it is not an oversight, it means that this issue might wait "an extended period" before receiving a response.
msg164984 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2012-07-08 09:44
Do we merge the patch for 3.3?
I'm +1 on this (patch submitted 8 months ago, backward compatible and reviewed).
msg164991 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2012-07-08 10:15
Can this be honestly classified as a bugfix though? If it's a feature it will have to be postponed to 3.4
msg164993 - (view) Author: Stefan Behnel (scoder) * Date: 2012-07-08 10:24
Looks like a new feature to me.
msg164996 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2012-07-08 10:27
Well, it fixes the behavior of ElementTree in some multi-threaded cases, provided you pass the namespace map as an argument of the serializer call.

The fix implements an optional argument for this use case.
As a side effect, it makes it easier to work with custom namespaces.

If the consensus is to wait for next version, I'm fine with that.
msg165002 - (view) Author: Stefan Behnel (scoder) * Date: 2012-07-08 10:56
Florent, what you describe is exactly the definition of a new feature.
Users even have to change their code in order to make use of it.
msg165496 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2012-07-15 03:39
I'm changing the issue name to reflect the direction it's taken. Florent, once 3.3 is branched, could you please refresh the patch vs. head for 3.4 (don't forget the "what's new") and I'll review it for commit.
msg165497 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2012-07-15 03:42
I'd also expand the doc of register_namespace to note what it should and shouldn't be used for (once this feature is added).
msg228422 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-10-04 01:07
This patch no longer applies to the tip of default.  Whoever updates it should also address Eli's comment about expanding the register_namespace doc.  I'm adding the 'easy' tag because Florent already did the hard work, and at this point it is just a patch update and doc change.
History
Date User Action Args
2014-10-04 01:07:03r.david.murraysetversions: + Python 3.5, - Python 3.4
nosy: + r.david.murray

messages: + msg228422

keywords: + easy
stage: commit review -> needs patch
2012-07-15 03:42:41eli.benderskysetmessages: + msg165497
2012-07-15 03:39:49eli.benderskysetmessages: + msg165496
title: Change the variable "nsmap" from global to instance (xml.etree.ElementTree) -> ET: add custom namespaces to serialization methods
2012-07-08 10:56:28scodersetmessages: + msg165002
2012-07-08 10:27:49floxsetmessages: + msg164996
2012-07-08 10:24:06scodersetmessages: + msg164993
versions: + Python 3.4, - Python 3.3
2012-07-08 10:15:39eli.benderskysetmessages: + msg164991
2012-07-08 09:44:33floxsetnosy: + eli.bendersky
messages: + msg164984
2011-12-10 20:47:17floxsetmessages: + msg149187
2011-12-10 07:04:25scodersetmessages: + msg149143
2011-12-10 07:03:06scodersetnosy: + effbot
2011-12-09 22:11:45floxsetfiles: + issue13378_non_global_namespaces_v3.diff

messages: + msg149133
stage: patch review -> commit review
2011-11-16 00:46:44floxsetfiles: + issue13378_non_global_namespaces_v2.diff

messages: + msg147743
2011-11-11 08:38:02scodersetmessages: + msg147422
2011-11-11 07:06:18scodersetmessages: + msg147419
2011-11-11 00:37:30floxsetnosy: + scoder
2011-11-11 00:23:33floxsetfiles: + issue13378_non_global_namespaces.diff
keywords: + patch
messages: + msg147415

stage: patch review
2011-11-09 23:22:49Nekmosetmessages: + msg147380
2011-11-09 22:32:24floxsetnosy: + flox
2011-11-09 22:28:35jceasetnosy: + jcea

messages: + msg147379
versions: + Python 3.3, - Python 3.2
2011-11-09 22:17:50Nekmocreate