Title: unittest cannot load module whose name starts with Unicode
Components: Library (Lib) Versions: Python 3.6, Python 3.5
Created on 2015-05-22 06:49 by sih4sing5hong5, last changed 2022-04-11 14:58 by admin.

test_dir.tar.gz sih4sing5hong5, 2015-06-23 04:46 failure example
VALID_MODULE_NAME.patch sih4sing5hong5, 2015-06-23 14:27 review
VALID_MODULE_NAME2.patch sih4sing5hong5, 2015-06-24 04:29 review
PR 1338 closed louielu, 2017-04-28 08:51
PR 13149 open a.badger, 2019-05-07 03:50
msg245662 - (view) Author: sih4sing5hong5 (sih4sing5hong5) * Date: 2015-06-23 03:50
Because VALID_MODULE_NAME is r'[_a-z]\w*\.py$' in unittest/

Using r'[^\W\d]\w*\.py$' insteaded.
msg245663 - (view) Author: Robert Collins (rbcollins) * (Python committer) Date: 2015-06-23 04:04
Are the module names valid in import statements?

it would help if you could perhaps attach a little tar/zip file with an example failure.
msg245667 - (view) Author: sih4sing5hong5 (sih4sing5hong5) * Date: 2015-06-23 04:46
There is an attached file for examples.

I ran
cd test_dir
python -m unittest -v

and got
"Ran 1 test in 0.000s"
msg245668 - (view) Author: sih4sing5hong5 (sih4sing5hong5) * Date: 2015-06-23 05:36
By the way, I ran with Python 3.4.0.
msg245675 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-06-23 10:00
r'[^\W\d]\w*' doesn't match all valid Python identifiers. It would be more correct to write the check as:

    root, ext = os.path.splitext(basename)
    if not (ext == '.py' and root.isidentifier()):
        # valid Python identifiers only
        return None, False
msg245677 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-06-23 10:57
Yes, I bet that regex is left over from python2, where we didn't have isidentifier.
msg245692 - (view) Author: sih4sing5hong5 (sih4sing5hong5) * Date: 2015-06-23 14:27
Thank you.
I updated my patch in `VALID_MODULE_NAME.patch`.
msg245718 - (view) Author: sih4sing5hong5 (sih4sing5hong5) * Date: 2015-06-24 04:29
update by adding `except AttributeError:`
msg248928 - (view) Author: Robert Collins (rbcollins) * (Python committer) Date: 2015-08-21 00:11
Thank you very much for writing your patch in backwards compatible style - it will make backporting to unittest2 much easier.
msg248930 - (view) Author: Robert Collins (rbcollins) * (Python committer) Date: 2015-08-21 00:16
I'm torn on whether this needs a test or not. It would be hard to regress, but testing this properly really wants hypothesis with a valid-python-identifier-strategy.

I think on balance we do need one.

So - we need a test in test_discover that mocks the presence of a file with a name containing e.g. \u2603.
msg261822 - (view) Author: Robert Collins (rbcollins) * (Python committer) Date: 2016-03-15 19:27
sih4sing5hong5  - I think we do need a test in fact - it can be done using mocks, but right now I think the patch has a bug - it looks for isidentifier on $, but not on just $thing (which we need to do to handle packages, vs modules).
msg292519 - (view) Author: Louie Lu (louielu) * Date: 2017-04-28 08:53
Add PR:

rbcollins: Need for help to review the patch, I think that both `$thing` and `$` can't be used in python (and for UNIX dir), and `\u2603` (☃) though can do something like `☃.py`, but it is not a valid identifier in python, too.
msg341572 - (view) Author: anthony shaw (anthonypjshaw) * (Python triager) Date: 2019-05-06 17:36
The original PR refers to a branch that no longer exists, but the behaviour documented still applies to master. There were some changes to the test loader, but none that fixed this issue.
msg341703 - (view) Author: Toshio Kuratomi (a.badger) * Date: 2019-05-07 10:35
I've opened a new PR at with the commit from and some additional changes to address the review comments given by serhiy.storchaka and rbcollins
msg341952 - (view) Author: anthony shaw (anthonypjshaw) * (Python triager) Date: 2019-05-09 01:14
thanks, will wait for a review from Serhiy, Rbcollins or ezio
msg342015 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-05-10 00:04
What is the current error on test_dir.tar.gz? I'm not sure which problem is trying to be solved here.

Why does PR 13149 use str.isidentifier() method? unittest doesn't allow arbitrary Unicode in filenames?
msg342078 - (view) Author: Toshio Kuratomi (a.badger) * Date: 2019-05-10 16:59
From the description, I think the bug is that filenames that *begin* with non-ascii are not searched for tests.  Looking at the test_dir.tar.gz contents,  this is the test case that I'd use:


$ python3 -m unittest discover -vv -p '*.py'
test_走 (tests試驗.Test試驗.試驗) ... ok
test_走 (tests試驗.test試驗.試驗) ... ok

Ran 2 tests in 0.000s


$ /srv/python/cpython/python -m unittest discover -vv -p '*.py'
test_走 (tests試驗.Test試驗.試驗) ... ok
test_走 (tests試驗.test試驗.試驗) ... ok
test_走 (tests試驗.試驗.試驗) ... ok

Ran 3 tests in 0.000s


isidentifier() is used because filenames to be discovered must be importable and thus valid identifiers:
