classification
Title: sys.audit() is called multiple times for glob.glob()
Type: Stage: resolved
Components: Library (Lib) Versions: Python 3.9, Python 3.8
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: miss-islington, serhiy.storchaka, steve.dower, xtreak
Priority: normal Keywords: patch

Created on 2019-09-13 09:07 by serhiy.storchaka, last changed 2020-02-06 11:08 by serhiy.storchaka. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 18360 merged serhiy.storchaka, 2020-02-05 09:26
PR 18373 merged miss-islington, 2020-02-06 08:26
Messages (9)
msg352248 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-09-13 09:07
sys.audit() for "glob.glob" is called in a recursive function _iglob(). So in process of executing glob.glob() it can be called multiple times, with the original pattern and with patterns for parent directories, while there are metacharacters in them.

I do not think it was intentional.
msg352252 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2019-09-13 09:28
Provided it's called with different arguments each time (which it is), there isn't a problem here. Audit hooks are supposed to be informative, not definitive (that is, you almost always need to take the surrounding context into consideration, which is why they are better for logging actions rather than blocking actions).

Though it might be unintentional that glob.glob() recursively handles each path segment when it could more efficiently work forward resolving each wildcard segment as it goes. Is that what you mean?
msg352264 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-09-13 09:44
Using recursion is rather an implementation detail, because splitpath() splits a path from the end (Path.glob() does it from other side). Also, it may be a tiny bit more efficient if the pattern contains a long literal prefix.
msg352269 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-09-13 09:48
Ah, and when added sys.audit() for glob.glob(), should not it be added for Path.glob() too? And for os.walk()?
msg352297 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2019-09-13 11:44
> when added sys.audit() for glob.glob(), should not it be added for Path.glob() too? And for os.walk()?

Sure, those would make sense.

They all go via scandir() which has its own event too (and will do for each directory), but being able to see that it came from a glob or walk call could be useful information.

Maybe moving the sys.audit call in glob.glob further down in the function would avoid some of the recursion? I don't want to duplicate where it's raised too much, as that provides opportunities to bypass it, but this one is only slightly informative - os.scandir is the real one that you'd worry about seeing.
msg352303 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-09-13 12:07
I think it would be better to call sys.audit() in glob.iglob(), before calling the top-level _iglob(). It will give you a general context, and at every recursion level sys.audit() will be called for os.scandir().
msg352304 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-09-13 12:11
It is harder to avoid repeating calls in os.walk(), because it is recursive itself. But you can introduce a recursive helper _walk() and make os.walk() just calling sys.audit() and _walk().
msg361474 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-02-06 08:26
New changeset 54b4f14712b9350f11c983f1c8ac47a3716958a7 by Serhiy Storchaka in branch 'master':
bpo-38149: Call sys.audit() only once per call for glob.glob(). (GH-18360)
https://github.com/python/cpython/commit/54b4f14712b9350f11c983f1c8ac47a3716958a7
msg361475 - (view) Author: miss-islington (miss-islington) Date: 2020-02-06 08:45
New changeset 708f472dd92f4f46c27ace710492da65da4a3319 by Miss Islington (bot) in branch '3.8':
bpo-38149: Call sys.audit() only once per call for glob.glob(). (GH-18360)
https://github.com/python/cpython/commit/708f472dd92f4f46c27ace710492da65da4a3319
History
Date User Action Args
2020-02-06 11:08:20serhiy.storchakasetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2020-02-06 08:45:23miss-islingtonsetnosy: + miss-islington
messages: + msg361475
2020-02-06 08:26:58miss-islingtonsetpull_requests: + pull_request17749
2020-02-06 08:26:41serhiy.storchakasetmessages: + msg361474
2020-02-05 09:26:32serhiy.storchakasetkeywords: + patch
stage: patch review
pull_requests: + pull_request17735
2019-09-13 12:11:00serhiy.storchakasetmessages: + msg352304
2019-09-13 12:07:55serhiy.storchakasetmessages: + msg352303
2019-09-13 11:44:29steve.dowersetmessages: + msg352297
2019-09-13 09:48:04serhiy.storchakasetmessages: + msg352269
2019-09-13 09:44:53serhiy.storchakasetmessages: + msg352264
2019-09-13 09:28:23steve.dowersetnosy: + xtreak
2019-09-13 09:28:11steve.dowersetnosy: - xtreak
messages: + msg352252
2019-09-13 09:14:49xtreaksetnosy: + xtreak
2019-09-13 09:07:51serhiy.storchakacreate