Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support recursive wildcards in pathlib.PurePath.match() #73435

Closed
JonWalsh mannequin opened this issue Jan 12, 2017 · 17 comments
Closed

Support recursive wildcards in pathlib.PurePath.match() #73435

JonWalsh mannequin opened this issue Jan 12, 2017 · 17 comments
Labels
3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes docs Documentation in the Doc dir topic-pathlib type-feature A feature request or enhancement

Comments

@JonWalsh
Copy link
Mannequin

JonWalsh mannequin commented Jan 12, 2017

BPO 29249
Nosy @pitrou, @RonnyPfannschmidt, @serhiy-storchaka, @wimglenn, @JulienPalard, @virtuald, @andunai, @barneygale, @eumiro

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2017-01-12.10:19:54.004>
labels = ['3.8', 'type-feature', '3.7', '3.9', 'docs']
title = 'Pathlib glob ** bug'
updated_at = <Date 2021-06-04.19:42:17.687>
user = 'https://bugs.python.org/JonWalsh'

bugs.python.org fields:

activity = <Date 2021-06-04.19:42:17.687>
actor = 'christian.heimes'
assignee = 'docs@python'
closed = False
closed_date = None
closer = None
components = ['Documentation']
creation = <Date 2017-01-12.10:19:54.004>
creator = 'Jon Walsh'
dependencies = []
files = []
hgrepos = []
issue_num = 29249
keywords = []
message_count = 11.0
messages = ['285297', '285305', '285308', '285311', '285323', '307854', '307866', '325735', '361147', '382535', '386795']
nosy_count = 12.0
nosy_names = ['pitrou', 'docs@python', 'Ronny.Pfannschmidt', 'serhiy.storchaka', 'wim.glenn', 'mdk', 'virtuald', 'Jon Walsh', 'Andrew Dunai', 'Isaac Muse', 'barneygale', 'eumiro']
pr_nums = []
priority = 'normal'
resolution = None
stage = 'needs patch'
status = 'open'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue29249'
versions = ['Python 3.5', 'Python 3.6', 'Python 3.7', 'Python 3.8', 'Python 3.9']

Linked PRs

@JonWalsh
Copy link
Mannequin Author

JonWalsh mannequin commented Jan 12, 2017

>>> from pathlib import Path
>>> Path("a/b/c/d/e.txt").match('a/*/**/*')
False

@JonWalsh JonWalsh mannequin added type-bug An unexpected behavior, bug, or error stdlib Python modules in the Lib dir labels Jan 12, 2017
@andunai
Copy link
Mannequin

andunai mannequin commented Jan 12, 2017

Isn't this intended? According to https://docs.python.org/2/library/glob.html and wiki, typical UNIX glob pattern does not have the reqursive matching operator (**).

@tiran
Copy link
Member

tiran commented Jan 12, 2017

The ticket is not about glob but about pathlib.

Pathlib supports ** directory globbing, but it's only documented as prefix globbing, https://docs.python.org/3/library/pathlib.html#pathlib.Path.glob

@serhiy-storchaka
Copy link
Member

** is supported not just as a prefix. Path('./Lib').glob('**/.py') emits the same paths as Path('.').glob('Lib/**/.py'). But ** is supported only in glob(), not in match(). The support of ** in match() is not documented. Would be worth to document explicitly that it is not supported.

@serhiy-storchaka serhiy-storchaka added docs Documentation in the Doc dir 3.7 (EOL) end of life and removed stdlib Python modules in the Lib dir labels Jan 12, 2017
@serhiy-storchaka serhiy-storchaka added type-feature A feature request or enhancement and removed type-bug An unexpected behavior, bug, or error labels Jan 12, 2017
@JonWalsh
Copy link
Mannequin Author

JonWalsh mannequin commented Jan 12, 2017

Seems a bit strange to not have glob() and match() working the same though. Is there any reason for that?

@virtuald
Copy link
Mannequin

virtuald mannequin commented Dec 8, 2017

I just ran into this also. It seems like a very strange omission that match and glob don't support the same patterns (and I'm surprised that they don't share more code).

@virtuald
Copy link
Mannequin

virtuald mannequin commented Dec 8, 2017

Because of backwards compatibility (despite a statement saying it's not guaranteed for pathlib), I think the best approach would be to create a 'globmatch' function for PurePath instead of modifying the match function, and document that the match function does a different kind of matching.

This isn't a patch for cpython per se (ironically, don't have time for that this month...), but here's a MIT-licensed gist that patches pathlib2 and adds a globmatch function to it, plus associated tests extracted from pathlib2 and my own ** related tests. Works for me, feel free to do with it as you wish.

https://gist.github.com/virtuald/dd0373bf3f26ec0730adf1da0fb929bb

@RonnyPfannschmidt
Copy link
Mannequin

RonnyPfannschmidt mannequin commented Sep 19, 2018

bpo-34731 was a duplicate of this

pytest was affected, as we port more bits to pathlib we hit this as well

bruno kindly implemented a local workaround in https://github.com/pytest-dev/pytest/pull/3980/files#diff-63fc5ed688925b327a5af20405bf4b09R19

@gpshead gpshead added 3.8 only security fixes 3.9 only security fixes labels Jan 31, 2020
@IsaacMuse
Copy link
Mannequin

IsaacMuse mannequin commented Feb 1, 2020

I think the idea of adding a globmatch function is a decent idea.

That is what I did in a library I wrote to get more out of glob than what Python offered out of the box: https://facelessuser.github.io/wcmatch/pathlib/#purepathglobmatch.

Specifically the differences are globmatch is just a pure match of a path, it doesn't do the implied ** at the beginning of a pattern like match does. While it doesn't enable ** by default, such features are controlled by flags

>>> pathlib.Path("a/b/c/d/e.txt").match('a/*/**/*', flags=pathlib.GLOBSTAR)
True

This isn't to promote my library, but more to say, as a user, I found such functionality worth adding. I think it would be generally nice to have such functionality in some form in Python by default. Maybe something called globmatch that offers that could be worthwhile.

@eumiro
Copy link
Mannequin

eumiro mannequin commented Dec 4, 2020

Today when porting some random project from os.path to pathlib I encountered a homemade filename matching method that I wanted to port to pathlib.Path.match. Unfortunately

>>> pathlib.Path('x').match('**/x')
False

although if I have a file called x in the current directory, both

>> pathlib.Path('.').glob('**/x')

and zsh's

$ ls **/x

can find it. It would be really nice to have analogous .glob and .match methods.

@JulienPalard
Copy link
Member

I'm +1 on adding ** to match.

My first bet would be to add it to match, not adding a new method, nor a flag, as it should not break compatibility:

It would only break iif someone have a ** in a match AND does not expect it to be recursive (as it would continue to match the previous files, it may just match more).

Would this break something I did not foresee?

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
@barneygale barneygale changed the title Pathlib glob ** bug Support recursive wildcards in pathlib.PurePath.match() Jan 27, 2023
barneygale added a commit to barneygale/cpython that referenced this issue Jan 28, 2023
…tch()

Add a new *recursive* argument to `pathlib.PurePath.match()`, defaulting
to `False`. If set to true, `match()` handles the `**` wildcard as in
`Path.glob()`, i.e. it matches any number of path segments.

We now compile a `re.Pattern` object for the entire pattern. This is made
more difficult by `fnmatch` not treating directory separators as special
when evaluating wildcards (`*`, `?`, etc), and so we arrange the path parts
onto separate *lines* in a string, and ensure we don't set `re.DOTALL`.
barneygale added a commit to barneygale/cpython that referenced this issue Jan 28, 2023
…ch()

Add a new *recursive* argument to `pathlib.PurePath.match()`, defaulting
to `False`. If set to true, `match()` handles the `**` wildcard as in
`Path.glob()`, i.e. it matches any number of path segments.

We now compile a `re.Pattern` object for the entire pattern. This is made
more difficult by `fnmatch` not treating directory separators as special
when evaluating wildcards (`*`, `?`, etc), and so we arrange the path parts
onto separate *lines* in a string, and ensure we don't set `re.DOTALL`.
@barneygale
Copy link
Contributor

I have a PR that implements this (pending another fix), but it doesn't support newlines in filenames or patterns. Is that any use, or should I try to work up a version that supports embedded newlines too? #101398

@barneygale
Copy link
Contributor

I've fixed support for newlines in my patch. I believe these two PRs will resolve this issue:

Would a core dev be willing to review, please? Thanks!

barneygale added a commit to barneygale/cpython that referenced this issue Feb 17, 2023
@barneygale
Copy link
Contributor

First PR has landed. This one is now ready:

It also makes match() about 10x faster in some cases!

barneygale added a commit to barneygale/cpython that referenced this issue Feb 20, 2023
barneygale added a commit to barneygale/cpython that referenced this issue Apr 3, 2023
barneygale added a commit to barneygale/cpython that referenced this issue Apr 9, 2023
barneygale added a commit to barneygale/cpython that referenced this issue Apr 29, 2023
barneygale added a commit to barneygale/cpython that referenced this issue May 2, 2023
barneygale added a commit to barneygale/cpython that referenced this issue May 6, 2023
barneygale added a commit to barneygale/cpython that referenced this issue May 18, 2023
barneygale added a commit to barneygale/cpython that referenced this issue May 23, 2023
barneygale added a commit to barneygale/cpython that referenced this issue May 27, 2023
barneygale added a commit to barneygale/cpython that referenced this issue May 29, 2023
barneygale added a commit to barneygale/cpython that referenced this issue May 29, 2023
barneygale added a commit that referenced this issue May 30, 2023
…#101398)

`PurePath.match()` now handles the `**` wildcard as in `Path.glob()`, i.e. it matches any number of path segments.

We now compile a `re.Pattern` object for the entire pattern. This is made more difficult by `fnmatch` not treating directory separators as special when evaluating wildcards (`*`, `?`, etc), and so we arrange the path parts onto separate *lines* in a string, and ensure we don't set `re.DOTALL`.

Co-authored-by: Hugo van Kemenade <hugovk@users.noreply.github.com>
Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
@barneygale
Copy link
Contributor

Thanks for your patience all. This is now implemented in 49f90ba / #101398, and will become available in Python 3.13!

@barneygale
Copy link
Contributor

Re-opening. I'm beginning to think that the implied **/ prefix limits the usefulness of this method, as it makes it impossible to match relative paths from the left hand side, and makes it subtly incompatible with glob().

I'm going to put up a PR that reverts match() back to 3.12 behaviour, and instead adds a globmatch() method.

@barneygale barneygale reopened this Jan 20, 2024
barneygale added a commit to barneygale/cpython that referenced this issue Jan 20, 2024
In 49f90ba we added support for the recursive wildcard `**` in
`pathlib.PurePath.match()`. This should allow arbitrary prefix and suffix
matching, like `p.match('foo/**')` or `p.match('**/foo')`, but there's a
problem: for relative patterns only, `match()` implicitly inserts a `**`
token on the left hand side, causing all patterns to match from the right.
As a result, it's impossible to match relative patterns from the left:
`PurePath('foo/bar').match('bar/**')` is true!

This commit reverts the changes to `match()`, and instead adds a new
`globmatch()` method that:

- Supports the recursive wildcard `**`
- Matches the *entire* path when given a relative pattern

As a result, `globmatch()`'s pattern language exactly matches that of
`glob()`.
barneygale added a commit that referenced this issue Jan 26, 2024
In 49f90ba we added support for the recursive wildcard `**` in
`pathlib.PurePath.match()`. This should allow arbitrary prefix and suffix
matching, like `p.match('foo/**')` or `p.match('**/foo')`, but there's a
problem: for relative patterns only, `match()` implicitly inserts a `**`
token on the left hand side, causing all patterns to match from the right.
As a result, it's impossible to match relative patterns from the left:
`PurePath('foo/bar').match('bar/**')` is true!

This commit reverts the changes to `match()`, and instead adds a new
`full_match()` method that:

- Allows empty patterns
- Supports the recursive wildcard `**`
- Matches the *entire* path when given a relative pattern
@barneygale
Copy link
Contributor

Re-resolving. This is now implemented as PurePath.full_match()

aisk pushed a commit to aisk/cpython that referenced this issue Feb 11, 2024
In 49f90ba we added support for the recursive wildcard `**` in
`pathlib.PurePath.match()`. This should allow arbitrary prefix and suffix
matching, like `p.match('foo/**')` or `p.match('**/foo')`, but there's a
problem: for relative patterns only, `match()` implicitly inserts a `**`
token on the left hand side, causing all patterns to match from the right.
As a result, it's impossible to match relative patterns from the left:
`PurePath('foo/bar').match('bar/**')` is true!

This commit reverts the changes to `match()`, and instead adds a new
`full_match()` method that:

- Allows empty patterns
- Supports the recursive wildcard `**`
- Matches the *entire* path when given a relative pattern
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes docs Documentation in the Doc dir topic-pathlib type-feature A feature request or enhancement
Projects
Development

No branches or pull requests

6 participants