Message 216774 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	scoder
Recipients	brycenesbitt, eli.bendersky, martin.panter, pocek, rhettinger, scoder
Date	2014-04-18.05:08:05
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1397797686.38.0.0542851960459.issue18304@psf.upfronthosting.co.za>
In-reply-to

Content
You can already use iterparse for this. it = ET.iterparse('somefile.xml') for _, el in it: el.tag = el.tag.split('}', 1)[1] # strip all namespaces root = it.root As I said, this would be a little friendlier with support in the QName class, but it's not really complex code. Could be added to the docs as a recipe, with a visible warning that this can easily lead to incorrect data processing and therefore should not be used in systems where the input is not entirely under control. Note that it's unclear what the "right way to do it" is, though. Is it better to 1) alter the data by stripping all namespaces off, or 2) let the tree API itself provide a namespace agnostic mode? Depends on the use case, but the more generic way 2) should be fairly involved in terms of implementation complexity, for just a minor use case. 1) would be ok in most cases where this "feature" is useful, I guess, and can be done as shown above. In fact, the advantage of doing it explicitly with iterparse() is that instead of stripping all namespaces, only the expected namespaces can be discarded. And errors can be raised when finding unnamespaced elements, for example. This allows for a safety guard that prevents the code from completely misinterpreting input. There is a reason why namespace were added to XML at some point.

You can already use iterparse for this.

    it = ET.iterparse('somefile.xml')
    for _, el in it:
        el.tag = el.tag.split('}', 1)[1]  # strip all namespaces
    root = it.root

As I said, this would be a little friendlier with support in the QName class, but it's not really complex code. Could be added to the docs as a recipe, with a visible warning that this can easily lead to incorrect data processing and therefore should not be used in systems where the input is not entirely under control.

Note that it's unclear what the "right way to do it" is, though. Is it better to 1) alter the data by stripping all namespaces off, or 2) let the tree API itself provide a namespace agnostic mode? Depends on the use case, but the more generic way 2) should be fairly involved in terms of implementation complexity, for just a minor use case. 1) would be ok in most cases where this "feature" is useful, I guess, and can be done as shown above.

In fact, the advantage of doing it explicitly with iterparse() is that instead of stripping all namespaces, only the expected namespaces can be discarded. And errors can be raised when finding unnamespaced elements, for example. This allows for a safety guard that prevents the code from completely misinterpreting input. There is a reason why namespace were added to XML at some point.

History
Date	User	Action	Args
2014-04-18 05:08:06	scoder	set	recipients: + scoder, rhettinger, eli.bendersky, martin.panter, brycenesbitt, pocek
2014-04-18 05:08:06	scoder	set	messageid: <1397797686.38.0.0542851960459.issue18304@psf.upfronthosting.co.za>
2014-04-18 05:08:06	scoder	link	issue18304 messages
2014-04-18 05:08:05	scoder	create