classification
Title: Escape the literal part of the path for glob()
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.10, Python 3.9, Python 3.8, Python 3.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: petr.viktorin, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2020-06-19 21:16 by serhiy.storchaka, last changed 2020-07-02 07:06 by serhiy.storchaka. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 20994 merged serhiy.storchaka, 2020-06-19 21:21
PR 21275 merged serhiy.storchaka, 2020-07-02 06:23
PR 21277 merged serhiy.storchaka, 2020-07-02 06:28
Messages (6)
msg371903 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-06-19 21:16
It is common to use glob() as

    glob.glob(os.path.join(basedir, pattern))

But it does not work correctly if the base directory contains special globbing characters ('*', '?', '['). It is an uncommon case, so in most cases the code works. But when move sources to the directory containing special characters, built it and run tests, some tests will fail:

test test_tokenize failed -- Traceback (most recent call last):
  File "/home/serhiy/py/[cpython]/Lib/test/test_tokenize.py", line 1615, in test_random_files
    testfiles.remove(os.path.join(tempdir, "test_unicode_identifiers.py"))
ValueError: list.remove(x): x not in list

test test_multiprocessing_fork failed -- Traceback (most recent call last):
  File "/home/serhiy/py/[cpython]/Lib/test/_test_multiprocessing.py", line 4272, in test_import
    modules = self.get_module_names()
  File "/home/serhiy/py/[cpython]/Lib/test/_test_multiprocessing.py", line 4267, in get_module_names
    modules.remove('multiprocessing.__init__')
ValueError: list.remove(x): x not in list

test test_bz2 failed -- Traceback (most recent call last):
  File "/home/serhiy/py/[cpython]/Lib/test/test_bz2.py", line 740, in testDecompressorChunksMaxsize
    self.assertFalse(bzd.needs_input)
AssertionError: True is not false

test test_multiprocessing_forkserver failed -- Traceback (most recent call last):
  File "/home/serhiy/py/[cpython]/Lib/test/_test_multiprocessing.py", line 4272, in test_import
    modules = self.get_module_names()
  File "/home/serhiy/py/[cpython]/Lib/test/_test_multiprocessing.py", line 4267, in get_module_names
    modules.remove('multiprocessing.__init__')
ValueError: list.remove(x): x not in list

test test_multiprocessing_spawn failed -- Traceback (most recent call last):
  File "/home/serhiy/py/[cpython]/Lib/test/_test_multiprocessing.py", line 4272, in test_import
    modules = self.get_module_names()
  File "/home/serhiy/py/[cpython]/Lib/test/_test_multiprocessing.py", line 4267, in get_module_names
    modules.remove('multiprocessing.__init__')
ValueError: list.remove(x): x not in list

The proposed PR adds glob.escape() to the above code:

    glob.glob(os.path.join(glob.escape(basedir), pattern))
msg371923 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-06-20 08:10
New changeset 935586845815f5b4c7814794413f6a812d4bd45f by Serhiy Storchaka in branch 'master':
bpo-41043: Escape literal part of the path for glob(). (GH-20994)
https://github.com/python/cpython/commit/935586845815f5b4c7814794413f6a812d4bd45f
msg372793 - (view) Author: Petr Viktorin (petr.viktorin) * (Python committer) Date: 2020-07-01 20:55
Would it be worth it to add a "base" keyword argument to glob.glob?
msg372805 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-07-02 06:19
It may be not exactly what you meant, but see issue38144. This issue actually was opened after I looked how that feature can be used in the stdlib and found that most of uses of glob() are vulnerable.
msg372808 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-07-02 07:05
New changeset ecfecc2d6ce88ae71c783f0465a508c6a1b2f2b6 by Serhiy Storchaka in branch '3.9':
[3.9] bpo-41043: Escape literal part of the path for glob(). (GH-20994). (GH-21275)
https://github.com/python/cpython/commit/ecfecc2d6ce88ae71c783f0465a508c6a1b2f2b6
msg372809 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-07-02 07:05
New changeset e73896241e55f452656fd8070eb79f344091bca0 by Serhiy Storchaka in branch '3.8':
[3.8] bpo-41043: Escape literal part of the path for glob(). (GH-20994). (GH-21277)
https://github.com/python/cpython/commit/e73896241e55f452656fd8070eb79f344091bca0
History
Date User Action Args
2020-07-02 07:06:05serhiy.storchakasetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2020-07-02 07:05:38serhiy.storchakasetmessages: + msg372809
2020-07-02 07:05:23serhiy.storchakasetmessages: + msg372808
2020-07-02 06:28:58serhiy.storchakasetpull_requests: + pull_request20426
2020-07-02 06:23:08serhiy.storchakasetpull_requests: + pull_request20424
2020-07-02 06:19:59serhiy.storchakasetmessages: + msg372805
2020-07-01 20:55:12petr.viktorinsetnosy: + petr.viktorin
messages: + msg372793
2020-06-20 08:10:51serhiy.storchakasetmessages: + msg371923
2020-06-19 21:21:35serhiy.storchakasetkeywords: + patch
stage: patch review
pull_requests: + pull_request20169
2020-06-19 21:16:43serhiy.storchakacreate