This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: unittest cannot load module whose name starts with Unicode
Type: behavior Stage: patch review
Components: Library (Lib) Versions: Python 3.6, Python 3.5
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: a.badger, anthonypjshaw, ezio.melotti, louielu, r.david.murray, rbcollins, serhiy.storchaka, sih4sing5hong5, vstinner
Priority: normal Keywords: easy, patch

Created on 2015-05-22 06:49 by sih4sing5hong5, last changed 2022-04-11 14:58 by admin.

Files
File name Uploaded Description Edit
test_dir.tar.gz sih4sing5hong5, 2015-06-23 04:46 failure example
VALID_MODULE_NAME.patch sih4sing5hong5, 2015-06-23 14:27 review
VALID_MODULE_NAME2.patch sih4sing5hong5, 2015-06-24 04:29 review
Pull Requests
URL Status Linked Edit
PR 1338 closed louielu, 2017-04-28 08:51
PR 13149 open a.badger, 2019-05-07 03:50
Messages (17)
msg245662 - (view) Author: sih4sing5hong5 (sih4sing5hong5) * Date: 2015-06-23 03:50
Because VALID_MODULE_NAME is r'[_a-z]\w*\.py$' in unittest/loader.py.

Using r'[^\W\d]\w*\.py$' insteaded.
msg245663 - (view) Author: Robert Collins (rbcollins) * (Python committer) Date: 2015-06-23 04:04
Are the module names valid in import statements?

it would help if you could perhaps attach a little tar/zip file with an example failure.
msg245667 - (view) Author: sih4sing5hong5 (sih4sing5hong5) * Date: 2015-06-23 04:46
There is an attached file for examples.

I ran
{{{
cd test_dir
python -m unittest -v
}}}

and got
"Ran 1 test in 0.000s"
msg245668 - (view) Author: sih4sing5hong5 (sih4sing5hong5) * Date: 2015-06-23 05:36
By the way, I ran with Python 3.4.0.
msg245675 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-06-23 10:00
r'[^\W\d]\w*' doesn't match all valid Python identifiers. It would be more correct to write the check as:

    root, ext = os.path.splitext(basename)
    if not (ext == '.py' and root.isidentifier()):
        # valid Python identifiers only
        return None, False
msg245677 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-06-23 10:57
Yes, I bet that regex is left over from python2, where we didn't have isidentifier.
msg245692 - (view) Author: sih4sing5hong5 (sih4sing5hong5) * Date: 2015-06-23 14:27
Thank you.
I updated my patch in `VALID_MODULE_NAME.patch`.
msg245718 - (view) Author: sih4sing5hong5 (sih4sing5hong5) * Date: 2015-06-24 04:29
update by adding `except AttributeError:`
msg248928 - (view) Author: Robert Collins (rbcollins) * (Python committer) Date: 2015-08-21 00:11
Thank you very much for writing your patch in backwards compatible style - it will make backporting to unittest2 much easier.
msg248930 - (view) Author: Robert Collins (rbcollins) * (Python committer) Date: 2015-08-21 00:16
I'm torn on whether this needs a test or not. It would be hard to regress, but testing this properly really wants hypothesis with a valid-python-identifier-strategy.

I think on balance we do need one.

So - we need a test in test_discover that mocks the presence of a file with a name containing e.g. \u2603.
msg261822 - (view) Author: Robert Collins (rbcollins) * (Python committer) Date: 2016-03-15 19:27
sih4sing5hong5  - I think we do need a test in fact - it can be done using mocks, but right now I think the patch has a bug - it looks for isidentifier on $thing.py, but not on just $thing (which we need to do to handle packages, vs modules).
msg292519 - (view) Author: Louie Lu (louielu) * Date: 2017-04-28 08:53
Add PR: https://github.com/python/cpython/pull/1338/


rbcollins: Need for help to review the patch, I think that both `$thing` and `$thing.py` can't be used in python (and for UNIX dir), and `\u2603` (☃) though can do something like `☃.py`, but it is not a valid identifier in python, too.
msg341572 - (view) Author: anthony shaw (anthonypjshaw) * (Python triager) Date: 2019-05-06 17:36
The original PR refers to a branch that no longer exists, but the behaviour documented still applies to master. There were some changes to the test loader, but none that fixed this issue.
msg341703 - (view) Author: Toshio Kuratomi (a.badger) * Date: 2019-05-07 10:35
I've opened a new PR at https://github.com/python/cpython/pull/13149 with the commit from https://github.com/python/cpython/pull/1338 and some additional changes to address the review comments given by serhiy.storchaka and rbcollins
msg341952 - (view) Author: anthony shaw (anthonypjshaw) * (Python triager) Date: 2019-05-09 01:14
thanks, will wait for a review from Serhiy, Rbcollins or ezio
msg342015 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-05-10 00:04
What is the current error on test_dir.tar.gz? I'm not sure which problem is trying to be solved here.

Why does PR 13149 use str.isidentifier() method? unittest doesn't allow arbitrary Unicode in filenames?
msg342078 - (view) Author: Toshio Kuratomi (a.badger) * Date: 2019-05-10 16:59
From the description, I think the bug is that filenames that *begin* with non-ascii are not searched for tests.  Looking at the test_dir.tar.gz contents,  this is the test case that I'd use:

Broken:

$ python3 -m unittest discover -vv -p '*.py'
test_走 (tests試驗.Test試驗.試驗) ... ok
test_走 (tests試驗.test試驗.試驗) ... ok

----------------------------------------------------------------------
Ran 2 tests in 0.000s

OK

Corrected:
$ /srv/python/cpython/python -m unittest discover -vv -p '*.py'
test_走 (tests試驗.Test試驗.試驗) ... ok
test_走 (tests試驗.test試驗.試驗) ... ok
test_走 (tests試驗.試驗.試驗) ... ok

----------------------------------------------------------------------
Ran 3 tests in 0.000s

OK


isidentifier() is used because filenames to be discovered must be importable and thus valid identifiers:  https://docs.python.org/3/library/unittest.html#test-discovery
History
Date User Action Args
2022-04-11 14:58:17adminsetgithub: 68451
2019-05-10 16:59:09a.badgersetmessages: + msg342078
2019-05-10 00:04:52vstinnersetmessages: + msg342015
2019-05-09 01:14:29anthonypjshawsetmessages: + msg341952
2019-05-07 10:35:54a.badgersetnosy: + a.badger
messages: + msg341703
2019-05-07 03:50:27a.badgersetstage: test needed -> patch review
pull_requests: + pull_request13064
2019-05-06 17:36:26anthonypjshawsetnosy: + anthonypjshaw
messages: + msg341572
2017-04-28 08:54:00louielusetnosy: + louielu
messages: + msg292519
2017-04-28 08:51:04louielusetpull_requests: + pull_request1449
2016-03-15 19:27:26rbcollinssetmessages: + msg261822
2016-01-01 21:59:29ezio.melottisetkeywords: + easy
stage: patch review -> test needed
components: + Library (Lib)
versions: - Python 3.4
2015-08-21 00:16:58rbcollinssetmessages: + msg248930
stage: patch review
2015-08-21 00:11:43rbcollinssetmessages: + msg248928
2015-06-24 04:29:34sih4sing5hong5setfiles: + VALID_MODULE_NAME2.patch

messages: + msg245718
2015-06-23 14:27:42sih4sing5hong5setfiles: + VALID_MODULE_NAME.patch

messages: + msg245692
2015-06-23 14:26:04sih4sing5hong5setfiles: - VALID_MODULE_NAME.patch
2015-06-23 10:57:35r.david.murraysetnosy: + r.david.murray
messages: + msg245677
2015-06-23 10:00:04serhiy.storchakasetnosy: + serhiy.storchaka

messages: + msg245675
versions: + Python 3.6
2015-06-23 05:36:52sih4sing5hong5setmessages: + msg245668
2015-06-23 04:46:53sih4sing5hong5setfiles: + test_dir.tar.gz

messages: + msg245667
2015-06-23 04:04:53rbcollinssetmessages: + msg245663
2015-06-23 03:50:51sih4sing5hong5setfiles: + VALID_MODULE_NAME.patch
keywords: + patch
messages: + msg245662

title: Why VALID_MODULE_NAME in unittest/loader.py is r'[_a-z]\w*\.py$' not r'\w+\.py$' ? -> unittest cannot load module whose name starts with Unicode
2015-05-22 07:07:35ned.deilysetnosy: + rbcollins

components: - Unicode
versions: + Python 3.5, - Python 3.2, Python 3.3
2015-05-22 06:49:00sih4sing5hong5create