Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Element.findall(path, dict) doesn't insert null namespace #74670

Closed
benwainwright mannequin opened this issue May 26, 2017 · 14 comments
Closed

Element.findall(path, dict) doesn't insert null namespace #74670

benwainwright mannequin opened this issue May 26, 2017 · 14 comments
Labels
3.8 only security fixes stdlib Python modules in the Lib dir topic-XML type-feature A feature request or enhancement

Comments

@benwainwright
Copy link
Mannequin

benwainwright mannequin commented May 26, 2017

BPO 30485
Nosy @rhettinger, @scoder, @zooba, @tirkarthi
PRs
  • bpo-30485: support a default prefix mapping in ElementPath #1823
  • bpo-30485: Re-allow empty strings in ElementPath namespace mappings #12830
  • bpo-30485: Change the prefix for defining the default namespace in ElementPath from None to '' #12860
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2019-05-03.06:44:21.911>
    created_at = <Date 2017-05-26.12:57:08.696>
    labels = ['expert-XML', '3.8', 'type-feature', 'library']
    title = "Element.findall(path, dict) doesn't insert null namespace"
    updated_at = <Date 2019-05-03.06:44:21.910>
    user = 'https://bugs.python.org/benwainwright'

    bugs.python.org fields:

    activity = <Date 2019-05-03.06:44:21.910>
    actor = 'scoder'
    assignee = 'none'
    closed = True
    closed_date = <Date 2019-05-03.06:44:21.911>
    closer = 'scoder'
    components = ['Library (Lib)', 'XML']
    creation = <Date 2017-05-26.12:57:08.696>
    creator = 'ben.wainwright'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 30485
    keywords = ['patch']
    message_count = 14.0
    messages = ['294549', '294553', '294558', '294566', '340191', '340192', '340222', '340223', '340224', '340226', '340227', '340238', '340365', '340509']
    nosy_count = 6.0
    nosy_names = ['rhettinger', 'scoder', 'eli.bendersky', 'steve.dower', 'ben.wainwright', 'xtreak']
    pr_nums = ['1823', '12830', '12860']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue30485'
    versions = ['Python 3.8']

    @benwainwright
    Copy link
    Mannequin Author

    benwainwright mannequin commented May 26, 2017

    The findall method for ElementTree.Element handles namespace prefixes by tokenising the path and inserting the full namespace in braces based on entries in a dictionary.

    Unfortunately, this does not work for a namespace without a prefix, so if you have files containing namespaces with and without prefixes, you still need to manually add the namespace url for the unprefixed path.

    The function xpath_tokenizer checks to see if tokens contain a colon and only adds in the namespace url in that instance.

    This could be changed to add the url if their is a colon, or if there is not, and the empty string key is present in the namespaces dictionary.

    @benwainwright benwainwright mannequin added type-bug An unexpected behavior, bug, or error stdlib Python modules in the Lib dir topic-XML labels May 26, 2017
    @rhettinger
    Copy link
    Contributor

    I agree that this is a recurring irritant.

    @rhettinger rhettinger added 3.7 (EOL) end of life type-feature A feature request or enhancement and removed type-bug An unexpected behavior, bug, or error labels May 26, 2017
    @scoder
    Copy link
    Contributor

    scoder commented May 26, 2017

    Agreed that this should be added. I think the key should be None, though, not the empty string. I attached a quick patch for lxml's corresponding file. It's mostly the same for ET.

    @scoder
    Copy link
    Contributor

    scoder commented May 26, 2017

    Patch replaced by pull request.
    #1823

    @scoder
    Copy link
    Contributor

    scoder commented Apr 14, 2019

    New changeset e9927e1 by Stefan Behnel in branch 'master':
    bpo-30485: support a default prefix mapping in ElementPath by passing None as prefix (bpo-1823)
    e9927e1

    @scoder
    Copy link
    Contributor

    scoder commented Apr 14, 2019

    I've merged the PR. It matches the implementation that has been released in lxml almost two years ago.

    @scoder scoder added 3.8 only security fixes and removed 3.7 (EOL) end of life labels Apr 14, 2019
    @scoder scoder closed this as completed Apr 14, 2019
    @tirkarthi
    Copy link
    Member

    I am not sure but this commit seems to have broken Azure CI on master for Windows. The build checks were green when the PR while merging. I can see the ValueError added in e9927e1 raised in the below CI log and it's occurring consistently

    https://dev.azure.com/Python/cpython/_build/results?buildId=40862&view=logs&jobId=0fcf9c9b-89fc-526f-8708-363e467e119e&taskId=ae411532-3d9e-5cb2-bb36-07007cc5bb48&lineStart=37&lineEnd=38&colStart=1&colEnd=1

    @scoder
    Copy link
    Contributor

    scoder commented Apr 14, 2019

    Interesting. Thanks for investigating this. It looks like the script "appxmanifest.py" uses an empty string as prefix for a lookup:

    File "D:\a\1\s\PC\layout\support\appxmanifest.py", line 407, in get_appxmanifest
    node = xml.find("m:Identity", NS)

    I don't know where that script comes from, but it suggests that strictly rejecting this kind of invalid library usage might not be the right thing to do for now. It seems to be a case where users pass an arbitrary prefix-namespace dict into .find() that they happen to have lying around, expecting unused mappings to be ignored. I pushed a PR that removes the exception for now.

    @tirkarthi
    Copy link
    Member

    The relevant line is at

    node = xml.find("m:Identity", NS)
    . I guess it's something related to build artifacts for Windows and Steve can have a better answer over the file's usage and this specific line of change.

    @scoder
    Copy link
    Contributor

    scoder commented Apr 14, 2019

    The script seems to generally assume that "" is a good representation for "no prefix", i.e. the default namespace, although that is IMHO more correctly represented as None. It's not very likely that this is the only script out there that makes that assumption.

    In fact, this might not be an entirely stupid assumption, given that None doesn't sort together with strings, for example. And sorting prefixes is not an unusual thing to do.

    That makes it a case of "practicality beats purity", I guess …

    I'll change the implementation to allow empty strings.

    @scoder scoder reopened this Apr 14, 2019
    @scoder
    Copy link
    Contributor

    scoder commented Apr 14, 2019

    New changeset 3c5a858 by Stefan Behnel in branch 'master':
    bpo-30485: Re-allow empty strings in ElementPath namespace mappings since they might actually be harmless and unused (and thus went undetected previously). (bpo-12830)
    3c5a858

    @zooba
    Copy link
    Member

    zooba commented Apr 15, 2019

    I don't recall where I got the empty string from, but it's certainly what I've used there for a while. Maybe it's required in register_namespace() to set the default namespace?

    @scoder
    Copy link
    Contributor

    scoder commented Apr 16, 2019

    I submitted a PR that changes the API back to an empty string. While lxml uses None here, an all-strings mapping is simply more convenient. I will start supporting both in lxml from the next release.

    Comments welcome.

    @scoder
    Copy link
    Contributor

    scoder commented Apr 18, 2019

    New changeset e8113f5 by Stefan Behnel in branch 'master':
    bpo-30485: Change the prefix for defining the default namespace in ElementPath from None to '' since there is existing code that uses that and it's more convenient to have an all-string-keys dict (e.g. when sorting items etc.). (bpo-12860)
    e8113f5

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 only security fixes stdlib Python modules in the Lib dir topic-XML type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants