This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Modules with decomposable characters in module name not found on macOS
Type: behavior Stage:
Components: Interpreter Core, macOS Versions: Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Norbert, brett.cannon, ned.deily, ronaldoussoren, vstinner
Priority: normal Keywords:

Created on 2020-03-03 00:25 by Norbert, last changed 2022-04-11 14:59 by admin.

Files
File name Uploaded Description Edit
Modules.zip Norbert, 2020-03-03 00:25
Messages (5)
msg363224 - (view) Author: (Norbert) Date: 2020-03-03 00:25
Modules whose names contain characters that are in precomposed form but can be decomposed in Normalization Form D can’t be found on macOS.


To reproduce:

1. Download and unzip the attached file Modules.zip. This produces a directory Modules with four Python source files.

2. In Terminal, go to the directory that contains Modules.

3. Run "python3 -m Modules.Import".


Expected behavior:

The following lines should be generated:
Maerchen
Märchen


Actual behavior:

The first line, “Maerchen” is generated, but then an error occurs:
Traceback (most recent call last):
 File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/runpy.py", line 193, in _run_module_as_main
   return _run_code(code, main_globals, None,
 File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/runpy.py", line 86, in _run_code
   exec(code, run_globals)
 File "/Users/business/tmp/pyimports/Modules/Import.py", line 5, in <module>
   from Modules.Märchen import hello2
ModuleNotFoundError: No module named 'Modules.Märchen'


Evaluation:

In the source file Modules/Import.py, the name of the module “Märchen” is written with the precomposed character U+00E4. The file name Märchen.py uses the decomposed character sequence U+0061 U+0308 instead. Macintosh file names commonly use a variant of Normalization Form D in file names – the old file system HFS enforces this, and while APFS doesn’t, the Finder still generates file names in this form. U+00E4 and U+0061 U+0308 are canonically equivalent, so they should be treated as equal in module loading.


Tested configuration:

CPython 3.8.2
macOS 10.14.6
msg363614 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2020-03-07 19:51
This seems like more an import issue than a uniquely macOS issue. Also, a quick search found Issue10952 which appears to be similar.
msg363774 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2020-03-09 21:50
The import system makes no attempt at normalizing Unicode strings for path comparisons. One would have to probably update FileFinder (https://github.com/python/cpython/blob/master/Lib/importlib/_bootstrap_external.py#L1392) somehow (assuming the appropriate codec support is available to importlib during start-up).
msg363804 - (view) Author: (Norbert) Date: 2020-03-10 05:19
Yes, if the Python runtime caches file names and determines based on the cache whether a file exists, then it needs to normalize both the file names in the cache and the name of the file it’s looking for. As far as I know, both HFS and APFS do this themselves when asked for a file by name, but if you ask for a list of available files, they don’t know what you’re comparing against.

I don’t think codecs would be involved here; I’d use unicodedata.normalize with either NFC or NFD – doesn’t matter which one.
msg363836 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2020-03-10 18:09
Regardless of which module is proposed to solve this, there is still a bootstrapping issue to consider.
History
Date User Action Args
2022-04-11 14:59:27adminsetgithub: 84013
2020-03-10 18:09:08brett.cannonsetmessages: + msg363836
2020-03-10 05:19:30Norbertsetmessages: + msg363804
2020-03-09 21:50:41brett.cannonsetmessages: + msg363774
2020-03-07 19:51:12ned.deilysetnosy: + brett.cannon, vstinner
messages: + msg363614
2020-03-07 10:25:32terry.reedysetnosy: + ronaldoussoren, ned.deily
components: + macOS
2020-03-03 00:25:16Norbertcreate