classification
Title: ElementPath 1.3 expressions
Type: behavior Stage: resolved
Components: XML Versions: Python 3.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: eli.bendersky, flox, patrick.vrijlandt, python-dev, scoder
Priority: low Keywords:

Created on 2011-06-13 10:11 by patrick.vrijlandt, last changed 2013-01-24 21:56 by scoder. This issue is now closed.

Messages (8)
msg138230 - (view) Author: patrick vrijlandt (patrick.vrijlandt) Date: 2011-06-13 10:11
From http://effbot.org/zone/element-xpath.htm:

[position]	(New in 1.3) Selects all elements that are located at the given position. The position can be either an integer (1 is the first position), the expression “last()” (for the last position), or a position relative to last() (e.g. “last()-1” for the second to last position). This predicate must be preceeded by a tag name.

Testing shows, that [0] is accepted, and seems to be [last()]. 
However [-1] selects no elements. I think these expressions should raise an exception. (Or the feature should be 0-based and behave like python list indices)
msg180390 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2013-01-22 14:17
The official documentation of XML ET is at http://docs.python.org/dev/library/xml.etree.elementtree.html

The arguments to XPath are clearly described, and the implementation behaves correctly. We will continue supporting XPath syntax there, rather than Python syntax, because this is explicitly a XPath feature.
msg180418 - (view) Author: patrick vrijlandt (patrick.vrijlandt) Date: 2013-01-22 18:43
Dear Eli,

According to the XPath spec, the 'position' as can be used in xpath expressions, should be positive. However, the current implementation (example below from 3.3.0) accepts some values that should not be ok.

Therefore, I do not agree that it behaves correctly. Garbage positions should return [] or an exception, not a value. And if you accept one value before the first position, you should accept them all.

DATA = '''<?xml version="1.0"?>
<data>
    <country name="Liechtenstein">
        <rank updated="yes">2</rank>
        <year>2008</year>
        <gdppc>141100</gdppc>
        <neighbor name="Austria" direction="E"/>
        <neighbor name="Switzerland" direction="W"/>
    </country>
    <country name="Singapore">
        <rank updated="yes">5</rank>
        <year>2011</year>
        <gdppc>59900</gdppc>
        <neighbor name="Malaysia" direction="N"/>
    </country>
    <country name="Panama">
        <rank updated="yes">69</rank>
        <year>2011</year>
        <gdppc>13600</gdppc>
        <neighbor name="Costa Rica" direction="W"/>
        <neighbor name="Colombia" direction="E"/>
    </country>
</data>
'''

import xml.etree.ElementTree as ET

root = ET.XML(DATA)
print(root)
for XP in (['./country'] +
           ['./country[%d]' % i for i in range(-1, 5)] +
           ['./country[last()%+d]' % i for i in range(-3, 5)]):
    print('{:20}'.format(XP), [elem.get('name') for elem in root.findall(XP)])

##  OUTPUT:
##    <Element 'data' at 0x03CD9BD0>
##    ./country            ['Liechtenstein', 'Singapore', 'Panama']
##    ./country[-1]        []
##    ./country[0]         ['Panama']
##    ./country[1]         ['Liechtenstein']
##    ./country[2]         ['Singapore']
##    ./country[3]         ['Panama']
##    ./country[4]         []
##    ./country[last()-3]  []
##    ./country[last()-2]  ['Liechtenstein']
##    ./country[last()-1]  ['Singapore']
##    ./country[last()+0]  ['Panama']
##    ./country[last()+1]  ['Liechtenstein']
##    ./country[last()+2]  ['Singapore']
##    ./country[last()+3]  ['Panama']
##    ./country[last()+4]  []
msg180524 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013-01-24 14:29
New changeset 56a4561600ad by Eli Bendersky in branch 'default':
Issue #12323: Strengthen error checking of the position XPath selectors
http://hg.python.org/cpython/rev/56a4561600ad
msg180525 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2013-01-24 14:31
Agreed. I strengthened the error checking when parsing the path, so now hopefully many non-sensical positions will be rejected.

Note that this is only for the default branch (the future Python 3.4), because I don't think this is important enough to warrant breaking existing pre-3.4 code. Only true bug fixes should go to released branches.
msg180548 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2013-01-24 21:17
I agree that [0] should be treated as a visible error as it's easy to get wrong. It's certainly too late to change this to 0-based indexing and I think it's ok to keep it 1-based for XPath compatibility (or at least similarity) as that's what people will expect it to mimic. I came up with a similar fix for lxml, BTW.

Regarding the behaviour of negative indices, it's clearly broken and unexpected and "-1" is definitely not a valid XML tag name (as which it's currently interpreted). That's a change for Py3.4+ only though. Either disable negative indices completely and raise an exception at parse time or read "-X" as "last()-X". I would also be ok with the latter as it feels natural to support this in a Python context. But that's a new feature then, not a bug fix. And the fact that "last()-X" already exists tends to speak against it. It's more of an "if the intention is clear, why raise an exception" kind of thing.
msg180549 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2013-01-24 21:36
> I agree that [0] should be treated as a visible error as it's easy to get
> wrong. It's certainly too late to change this to 0-based indexing and I
> think it's ok to keep it 1-based for XPath compatibility (or at least
> similarity) as that's what people will expect it to mimic. I came up with a
> similar fix for lxml, BTW.
>
> Regarding the behaviour of negative indices, it's clearly broken and
> unexpected and "-1" is definitely not a valid XML tag name (as which it's
> currently interpreted). That's a change for Py3.4+ only though. Either
> disable negative indices completely and raise an exception at parse time or
> read "-X" as "last()-X". I would also be ok with the latter as it feels
> natural to support this in a Python context. But that's a new feature then,
> not a bug fix. And the fact that "last()-X" already exists tends to speak
> against it. It's more of an "if the intention is clear, why raise an
> exception" kind of thing.
>

Stefan,

IIUC, my recent commit (mirrored to the issue) is in accord with this
comment. Please correct me if I'm wrong.
msg180550 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2013-01-24 21:56
Yes, I think it makes sense to be rigid now and *maybe* add a new feature later.
History
Date User Action Args
2013-01-24 21:56:17scodersetmessages: + msg180550
2013-01-24 21:36:16eli.benderskysetmessages: + msg180549
2013-01-24 21:17:27scodersetnosy: + scoder
messages: + msg180548
2013-01-24 14:31:10eli.benderskysetresolution: rejected -> fixed
stage: needs patch -> resolved
messages: + msg180525
versions: - Python 3.2, Python 3.3
2013-01-24 14:29:41python-devsetnosy: + python-dev
messages: + msg180524
2013-01-22 18:43:58patrick.vrijlandtsetmessages: + msg180418
2013-01-22 14:17:13eli.benderskysetpriority: normal -> low
status: open -> closed
resolution: rejected
messages: + msg180390
2013-01-22 12:53:18ezio.melottisetstage: needs patch
versions: + Python 3.4
2012-07-21 14:09:02floxsetnosy: + eli.bendersky

versions: + Python 3.3
2011-06-13 12:58:24pitrousetnosy: + flox
2011-06-13 10:11:32patrick.vrijlandtcreate