Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ET: add custom namespaces to serialization methods #57587

Open
Nekmo mannequin opened this issue Nov 9, 2011 · 20 comments
Open

ET: add custom namespaces to serialization methods #57587

Nekmo mannequin opened this issue Nov 9, 2011 · 20 comments
Labels
topic-XML type-feature A feature request or enhancement

Comments

@Nekmo
Copy link
Mannequin

Nekmo mannequin commented Nov 9, 2011

BPO 13378
Nosy @jcea, @scoder, @vstinner, @bitdancer, @florentx, @Nekmo
Files
  • issue13378_non_global_namespaces.diff
  • issue13378_non_global_namespaces_v2.diff
  • issue13378_non_global_namespaces_v3.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2011-11-09.22:17:50.512>
    labels = ['expert-XML', 'type-feature']
    title = 'ET: add custom namespaces to serialization methods'
    updated_at = <Date 2019-07-29.11:38:53.572>
    user = 'https://github.com/Nekmo'

    bugs.python.org fields:

    activity = <Date 2019-07-29.11:38:53.572>
    actor = 'vstinner'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['XML']
    creation = <Date 2011-11-09.22:17:50.512>
    creator = 'Nekmo'
    dependencies = []
    files = ['23655', '23702', '23895']
    hgrepos = []
    issue_num = 13378
    keywords = ['patch']
    message_count = 19.0
    messages = ['147378', '147379', '147380', '147415', '147419', '147422', '147743', '149133', '149143', '149187', '164984', '164991', '164993', '164996', '165002', '165496', '165497', '228422', '348625']
    nosy_count = 8.0
    nosy_names = ['effbot', 'jcea', 'scoder', 'vstinner', 'r.david.murray', 'eli.bendersky', 'flox', 'Nekmo']
    pr_nums = []
    priority = 'normal'
    resolution = None
    stage = 'needs patch'
    status = 'open'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue13378'
    versions = ['Python 3.5']

    @Nekmo
    Copy link
    Mannequin Author

    Nekmo mannequin commented Nov 9, 2011

    Currently, the mapping of namespaces is global and can cause failures if multiple instances are used or in multithreading. The variable is in xml.etree.ElementTree._namespace_map. I ask it to be switched to xml.etree._Element instance.

    @Nekmo Nekmo mannequin added topic-XML type-feature A feature request or enhancement labels Nov 9, 2011
    @jcea
    Copy link
    Member

    jcea commented Nov 9, 2011

    Tagging this as targeting 3.3.

    Nekmo, could you possibly poste some code showing the problem?

    @Nekmo
    Copy link
    Mannequin Author

    Nekmo mannequin commented Nov 9, 2011

    In my case, I have several clients, and they define the namespaces. I am interested in return the same namespace that they gave me, for example, the client "A" gives me this:

    <house:iq xmlns:house="http://localhost/house" />

    To name the namespace, I set it at nsmap:

    >>> import xml.etree.ElementTree as etree
    >>> etree.register_namespace('house', 'http://localhost/house')
    >>> etree._namespace_map
    {'http://localhost/house': 'house',
     'http://purl.org/dc/elements/1.1/': 'dc',
     'http://schemas.xmlsoap.org/wsdl/': 'wsdl',
     'http://www.w3.org/1999/02/22-rdf-syntax-ns#': 'rdf',
     'http://www.w3.org/1999/xhtml': 'html',
     'http://www.w3.org/2001/XMLSchema': 'xs',
     'http://www.w3.org/2001/XMLSchema-instance': 'xsi',
     'http://www.w3.org/XML/1998/namespace': 'xml'}
    
    Thus, keeping the name of the namespace:
    >>> etree.tostring(etree.Element('{http://localhost/house}iq'))
    b'<house:iq xmlns:house="http://localhost/house" />'

    But if I have a client "B", which uses a different name, and run in parallel, problems can occur:

    <home:iq xmlns:home="http://localhost/house" />

    >>> import xml.etree.ElementTree as etree
    >>> etree.register_namespace('home', 'http://localhost/house')
    >>> etree._namespace_map
    {'http://localhost/house': 'home',
     'http://purl.org/dc/elements/1.1/': 'dc',
     'http://schemas.xmlsoap.org/wsdl/': 'wsdl',
     'http://www.w3.org/1999/02/22-rdf-syntax-ns#': 'rdf',
     'http://www.w3.org/1999/xhtml': 'html',
     'http://www.w3.org/2001/XMLSchema': 'xs',
     'http://www.w3.org/2001/XMLSchema-instance': 'xsi',
     'http://www.w3.org/XML/1998/namespace': 'xml'}

    Therefore, I ask that _namespace_map is within etree._Element instance, and not global

    @florentx
    Copy link
    Mannequin

    florentx mannequin commented Nov 11, 2011

    This patch proposes an implementation of the feature.

    >>> from xml.etree import ElementTree as ET
    >>> ET.tostring(ET.Element('{http://localhost/house}iq'), encoding="unicode", namespaces={'http://localhost/house': 'home'})
    '<home:iq xmlns:home="http://localhost/house" />'

    @scoder
    Copy link
    Contributor

    scoder commented Nov 11, 2011

    Florent, thanks for the notification.

    Nekmo, note that you are misusing this feature. The _namespace_map is meant to provide "well known namespace prefixes" only, so that common namespaces end up using the "expected" prefix. This is also the reason why it maps namespaces to prefixes and not the other way round. It is not meant to temporarily assign arbitrary prefix to namespaces. That is the reason for it being a global option.

    That being said, lxml.etree's Element factory takes an "nsmap" parameter that implements the feature you want. It's documented here:

    http://lxml.de/tutorial.html#namespaces

    Note that it maps prefixes to namespaces and not the other way round. This is because there is a corresponding "nsmap" property on Elements that provides the currently defined prefixes in the context of an Element. ElementTree itself does not (and cannot) support this property because it drops the prefixes during parsing. However, I would still request that an implementation of the parameter to the Element() factory should be compatible for both libraries.

    Also look for "nsmap" in the compatibility docs (appears in two sections):

    http://lxml.de/compatibility.html

    @scoder
    Copy link
    Contributor

    scoder commented Nov 11, 2011

    Reading the proposed patch, I must agree that it makes more sense in ElementTree to support this as a serialiser feature. ET's tree model doesn't have a notion of prefixes, whereas it's native to lxml.etree.

    Two major advantages of putting this into the serialiser are: 1) cET doesn't have to be modified, and 2) it does not require additional memory to store the nsmap reference on each Element. The latter by itself is a very valuable property, given that cET aims specifically at a low memory overhead.

    I see a couple of drawbacks:

    1. it only supports the case that namespaces are globally defined. The implementation cannot handle the case that local namespaces should only be defined in subtrees, or that prefixes are being reused. This is no real restriction because globally defined namespaces are usually just fine. It's more of an inconvenience in some cases, such as multi-namespace languages like SOAP or WSDL+XSD, where namespaces are commonly declared on the subtree where they start being used.

    2. lxml.etree cannot support this because it keeps the prefixes in the tree nodes and uses them on serialisation. This cannot easily be overridden because the serialiser is part of libxml2.

    I didn't see in the patch how (or if?) the prefix redefinition case is handled. Given that prefixes are always defined globally, it would be nice if this only resulted in an error if two namespaces that are really used in the document map to the same prefix, not always when the namespace dict is redundant by itself.

    Also note that it's good to be explicit about the keyword arguments that a function accepts. It aids when help(tostring) tells you directly what you can pass in, instead of just printing "**kw".

    @florentx
    Copy link
    Mannequin

    florentx mannequin commented Nov 16, 2011

    Thank you Stefan for the comments.
    I've added the prefix collision detection, and removed the **kw argument.
    (+ tests)

    @florentx
    Copy link
    Mannequin

    florentx mannequin commented Dec 9, 2011

    Updated with documentation.
    Thank you for the review.

    I know this does not cover different namespaces in subtree.
    But this use case seems very specific. The user could find other means to achieve it.

    @scoder
    Copy link
    Contributor

    scoder commented Dec 10, 2011

    Given that this is a major new feature for the serialiser in ElementTree, I think it's worth asking Fredrik for any comments.

    @florentx
    Copy link
    Mannequin

    florentx mannequin commented Dec 10, 2011

    Of course it's better to have someone else to review the patch.
    However in this case, I'm not sure it is a major feature.

    BTW, I noticed that effbot is currently marked as *inactive* maintainer
    http://docs.python.org/devguide/experts.html#stdlib

    If it is not an oversight, it means that this issue might wait "an extended period" before receiving a response.

    @florentx
    Copy link
    Mannequin

    florentx mannequin commented Jul 8, 2012

    Do we merge the patch for 3.3?
    I'm +1 on this (patch submitted 8 months ago, backward compatible and reviewed).

    @elibendersky
    Copy link
    Mannequin

    elibendersky mannequin commented Jul 8, 2012

    Can this be honestly classified as a bugfix though? If it's a feature it will have to be postponed to 3.4

    @scoder
    Copy link
    Contributor

    scoder commented Jul 8, 2012

    Looks like a new feature to me.

    @florentx
    Copy link
    Mannequin

    florentx mannequin commented Jul 8, 2012

    Well, it fixes the behavior of ElementTree in some multi-threaded cases, provided you pass the namespace map as an argument of the serializer call.

    The fix implements an optional argument for this use case.
    As a side effect, it makes it easier to work with custom namespaces.

    If the consensus is to wait for next version, I'm fine with that.

    @scoder
    Copy link
    Contributor

    scoder commented Jul 8, 2012

    Florent, what you describe is exactly the definition of a new feature.
    Users even have to change their code in order to make use of it.

    @elibendersky
    Copy link
    Mannequin

    elibendersky mannequin commented Jul 15, 2012

    I'm changing the issue name to reflect the direction it's taken. Florent, once 3.3 is branched, could you please refresh the patch vs. head for 3.4 (don't forget the "what's new") and I'll review it for commit.

    @elibendersky elibendersky mannequin changed the title Change the variable "nsmap" from global to instance (xml.etree.ElementTree) ET: add custom namespaces to serialization methods Jul 15, 2012
    @elibendersky
    Copy link
    Mannequin

    elibendersky mannequin commented Jul 15, 2012

    I'd also expand the doc of register_namespace to note what it should and shouldn't be used for (once this feature is added).

    @bitdancer
    Copy link
    Member

    This patch no longer applies to the tip of default. Whoever updates it should also address Eli's comment about expanding the register_namespace doc. I'm adding the 'easy' tag because Florent already did the hard work, and at this point it is just a patch update and doc change.

    @bitdancer bitdancer added the easy label Oct 4, 2014
    @vstinner
    Copy link
    Member

    This issue is 8 years old and has already 3 patches attached, it is not newcomer friendly: I remove the "easy" keyword.

    @vstinner vstinner removed the easy label Jul 29, 2019
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    tkarabela added a commit to tkarabela/cpython that referenced this issue Nov 30, 2022
    Co-authored-by: Florent Xicluna <florent.xicluna@gmail.com>
    @tkarabela
    Copy link

    Hi, I'm also interested in this feature, so I've updated the patch issue13378_non_global_namespaces_v3.diff to work with current main, see here: tkarabela@89086fb

    I've encountered one test failure in a test that did not exist at the time of writing the patch, here:

    def test_tostring_default_namespace_different_namespace(self):
    elem = ET.XML('<body xmlns="http://effbot.org/ns"><tag/></body>')
    self.assertEqual(
    ET.tostring(elem, encoding='unicode', default_namespace='foobar'),
    '<ns1:body xmlns="foobar" xmlns:ns1="http://effbot.org/ns"><ns1:tag /></ns1:body>'
    )

    With the patch, the output is:

    <ns0:body xmlns:ns0="http://effbot.org/ns"><ns0:tag /></ns0:body>

    which is equivalent, so I've updated this test as well to expect this new output.

    @florentx Since this is your work, would you like to take this updated patch and open a pull request for it yourself? :) Otherwise I can make the pull request.

    Docs for xml.etree.ElementTree.register_namespace should also be updated, that is not part of this patch.

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    topic-XML type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants