classification
Title: Element.findall(path, dict) doesn't insert null namespace
Type: enhancement Stage: patch review
Components: Library (Lib), XML Versions: Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: ben.wainwright, eli.bendersky, rhettinger, scoder, steve.dower, xtreak
Priority: normal Keywords: patch

Created on 2017-05-26 12:57 by ben.wainwright, last changed 2019-04-18 17:05 by scoder.

Pull Requests
URL Status Linked Edit
PR 1823 merged python-dev, 2017-05-26 19:17
PR 12830 merged scoder, 2019-04-14 18:39
PR 12860 merged scoder, 2019-04-16 21:16
Messages (14)
msg294549 - (view) Author: Ben Wainwright (ben.wainwright) Date: 2017-05-26 12:57
The findall method for ElementTree.Element handles namespace prefixes by tokenising the path and inserting the full namespace in braces based on entries in a dictionary.

Unfortunately, this does not work for a namespace without a prefix, so if you have files containing namespaces with and without prefixes, you still need to manually add the namespace url for the unprefixed path.

The function xpath_tokenizer checks to see if tokens contain a colon and only adds in the namespace url in that instance.

This could be changed to add the url if their is a colon, or if there is not, and the empty string key is present in the namespaces dictionary.
msg294553 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2017-05-26 15:18
I agree that this is a recurring irritant.
msg294558 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2017-05-26 17:03
Agreed that this should be added. I think the key should be None, though, not the empty string. I attached a quick patch for lxml's corresponding file. It's mostly the same for ET.
msg294566 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2017-05-26 19:16
Patch replaced by pull request.
https://github.com/python/cpython/pull/1823
msg340191 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2019-04-14 08:09
New changeset e9927e1820caea01e576141d9a623ea394d43dad by Stefan Behnel in branch 'master':
bpo-30485: support a default prefix mapping in ElementPath by passing None as prefix (#1823)
https://github.com/python/cpython/commit/e9927e1820caea01e576141d9a623ea394d43dad
msg340192 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2019-04-14 08:13
I've merged the PR. It matches the implementation that has been released in lxml almost two years ago.
msg340222 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python triager) Date: 2019-04-14 18:10
I am not sure but this commit seems to have broken Azure CI on master for Windows. The build checks were green when the PR while merging. I can see the ValueError added in e9927e1820caea01e576141d9a623ea394d43dad raised in the below CI log and it's occurring consistently

https://dev.azure.com/Python/cpython/_build/results?buildId=40862&view=logs&jobId=0fcf9c9b-89fc-526f-8708-363e467e119e&taskId=ae411532-3d9e-5cb2-bb36-07007cc5bb48&lineStart=37&lineEnd=38&colStart=1&colEnd=1
msg340223 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2019-04-14 18:41
Interesting. Thanks for investigating this. It looks like the script "appxmanifest.py" uses an empty string as prefix for a lookup:

  File "D:\a\1\s\PC\layout\support\appxmanifest.py", line 407, in get_appxmanifest
    node = xml.find("m:Identity", NS)

I don't know where that script comes from, but it suggests that strictly rejecting this kind of invalid library usage might not be the right thing to do for now. It seems to be a case where users pass an arbitrary prefix-namespace dict into .find() that they happen to have lying around, expecting unused mappings to be ignored. I pushed a PR that removes the exception for now.
msg340224 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python triager) Date: 2019-04-14 18:50
The relevant line is at https://github.com/python/cpython/blob/cd466559c4a312b3c1223a774ad4df19fc4f0407/PC/layout/support/appxmanifest.py#L407 . I guess it's something related to build artifacts for Windows and Steve can have a better answer over the file's usage and this specific line of change.
msg340226 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2019-04-14 19:07
The script seems to generally assume that "" is a good representation for "no prefix", i.e. the default namespace, although that is IMHO more correctly represented as None. It's not very likely that this is the only script out there that makes that assumption.

In fact, this might not be an entirely stupid assumption, given that None doesn't sort together with strings, for example. And sorting prefixes is not an unusual thing to do.

That makes it a case of "practicality beats purity", I guess …

I'll change the implementation to allow empty strings.
msg340227 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2019-04-14 19:12
New changeset 3c5a858ec6a4e5851903762770fe526a46d3c351 by Stefan Behnel in branch 'master':
bpo-30485: Re-allow empty strings in ElementPath namespace mappings since they might actually be harmless and unused (and thus went undetected previously). (#12830)
https://github.com/python/cpython/commit/3c5a858ec6a4e5851903762770fe526a46d3c351
msg340238 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2019-04-15 00:13
I don't recall where I got the empty string from, but it's certainly what I've used there for a while. Maybe it's required in register_namespace() to set the default namespace?
msg340365 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2019-04-16 21:19
I submitted a PR that changes the API back to an empty string. While lxml uses None here, an all-strings mapping is simply more convenient. I will start supporting both in lxml from the next release.

Comments welcome.
msg340509 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2019-04-18 17:05
New changeset e8113f51a8bdf33188ee30a1c038a298329e7bfa by Stefan Behnel in branch 'master':
bpo-30485: Change the prefix for defining the default namespace in ElementPath from None to '' since there is existing code that uses that and it's more convenient to have an all-string-keys dict (e.g. when sorting items etc.). (#12860)
https://github.com/python/cpython/commit/e8113f51a8bdf33188ee30a1c038a298329e7bfa
History
Date User Action Args
2019-04-18 17:05:26scodersetmessages: + msg340509
2019-04-16 21:19:56scodersetmessages: + msg340365
2019-04-16 21:16:01scodersetstage: needs patch -> patch review
pull_requests: + pull_request12784
2019-04-15 00:13:47steve.dowersetmessages: + msg340238
2019-04-14 19:12:39scodersetmessages: + msg340227
2019-04-14 19:07:17scodersetstatus: closed -> open
resolution: fixed ->
messages: + msg340226

stage: resolved -> needs patch
2019-04-14 18:50:20xtreaksetnosy: + steve.dower
messages: + msg340224
2019-04-14 18:41:20scodersetmessages: + msg340223
2019-04-14 18:39:34scodersetpull_requests: + pull_request12755
2019-04-14 18:10:38xtreaksetnosy: + xtreak
messages: + msg340222
2019-04-14 08:13:47scodersetstatus: open -> closed
versions: + Python 3.8, - Python 3.7
messages: + msg340192

resolution: fixed
stage: resolved
2019-04-14 08:09:14scodersetmessages: + msg340191
2017-05-26 19:17:43python-devsetpull_requests: + pull_request1910
2017-05-26 19:16:42scodersetmessages: + msg294566
2017-05-26 18:49:29scodersetfiles: - lxml_elpath_empty_prefix.patch
2017-05-26 17:03:52scodersetfiles: + lxml_elpath_empty_prefix.patch
keywords: + patch
messages: + msg294558
2017-05-26 15:18:17rhettingersetversions: + Python 3.7, - Python 3.6
nosy: + rhettinger, scoder, eli.bendersky

messages: + msg294553

type: behavior -> enhancement
2017-05-26 12:57:08ben.wainwrightcreate