classification
Title: Py_Initialze() throws error 'unable to load the file system encoding' when calling Py_SetPath with a path to a directory
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.7
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: paul.moore, rvq, steve.dower, tim.golden, vstinner, zach.ware
Priority: normal Keywords:

Created on 2019-04-18 14:57 by rvq, last changed 2019-04-23 15:39 by steve.dower. This issue is now closed.

Files
File name Uploaded Description Edit
Capture.PNG rvq, 2019-04-18 14:57 Screenshot of the directory containing the pyton I builded
Messages (5)
msg340494 - (view) Author: RimacV (rvq) Date: 2019-04-18 14:57
I compiled the source of CPython 3.7.3 myself on Windows with Visual Studio 2017 together with some packages like e.g numpy. When I start the Python Interpreter I am able to import and use numpy. However when I am running the same script via the C-API I get an ModuleNotFoundError. 

So the first thing I did, was to check if numpy is in my site-packages directory and indeed there is a folder named numpy-1.16.2-py3.7-win-amd64.egg. (Makes sense because the python interpreter can find numpy)

The next thing I did was get some information about the sys.path variable created when running the script via the C-API. 

##### sys.path content ####
C:\Work\build\product\python37.zip
C:\Work\build\product\DLLs
C:\Work\build\product\lib
C:\PROGRAM FILES (X86)\MICROSOFT VISUAL STUDIO\2017\PROFESSIONAL\COMMON7\IDE\EXTENSIONS\TESTPLATFORM
C:\Users\rvq\AppData\Roaming\Python\Python37\site-packages

Examining the content of sys.path I noticed two things. 

1. 
C:\Work\build\product\python37.zip has the correct path 'C:\Work\build\product\'. There was just no zip file. All my files and directory were unpacked. So I zipped the files to an archive named python37.zip and this resolved the import error.

2. C:\Users\rvq\AppData\Roaming\Python\Python37\site-packages is wrong it should be C:\Work\build\product\Lib\site-packages but I dont know how this wrong path is created. 


The next thing I tried was to use Py_SetPath(L"C:/Work/build/product/Lib/site-packages") before calling Py_Initialize(). This led to the 

Fatal Python Error 'unable to load the file system encoding' 
ModuleNotFoundError: No module named 'encodings'


I created a minimal c++ project with exact these two calls and started to debug Cpython. 

int main()
{
  Py_SetPath(L"C:/Work/build/product/Lib/site-packages");
  Py_Initialize();
}

I tracked the call of Py_Initialize() down to the call of 

static int
zipimport_zipimporter___init___impl(ZipImporter *self, PyObject *path)

inside of zipimport.c

The comment above this function states the following: 

Create a new zipimporter instance.
'archivepath' must be a path-like object to a zipfile, or to a specific path
inside a zipfile. For example, it can be '/tmp/myimport.zip', or
'/tmp/myimport.zip/mydirectory', if mydirectory is a valid directory inside
the archive.
'ZipImportError' is raised if 'archivepath' doesn't point to a valid Zip
archive.
The 'archive' attribute of the zipimporter object contains the name of the
zipfile targeted.


So for me it seems that the C-API expects the path set with Py_SetPath to be a path to a zipfile. Is this expected behaviour or is it a bug? 
If it is not a bug is there a way to changes this so that it can also detect directories? 

PS: The ModuleNotFoundError did not occur for me when using Python 3.5.2+, which was the version I used in my project before. I also checked if I had set any PYTHONHOME or PYTHONPATH environment variables but I did not see one of them on my system.
msg340501 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2019-04-18 15:30
This is probably a documentation failure more than anything else. We're in the middle of redesigning initialization though, so it's good timing to contribute this feedback.

The short answer is that you need to make sure Python can find the Lib/encodings directory, typically by putting the standard library in sys.path. Py_SetPath clears all inferred paths, so you need to specify all the places Python should look. (The rules for where Python looks automatically are complicated and vary by platform, which is something I'm keen to fix.)

Paths that don't exist are okay, and that's the zip file. You can choose to put the stdlib into a zip, and it will be found automatically if you name it the default path, but you can also leave it unzipped and reference the directory.

A full walk through on embedding is more than I'm prepared to type on my phone. Hopefully that's enough to get you going for now.
msg340514 - (view) Author: RimacV (rvq) Date: 2019-04-18 21:54
Thanks for your quick response! I will try your suggestion on tuesday and will then let you know, if it worked as expected.
msg340732 - (view) Author: RimacV (rvq) Date: 2019-04-23 15:18
As you said, I just had to set all paths, when using Py_SetPath. 
From my side this issue could be closed. 

Thank you! :-)
msg340735 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2019-04-23 15:39
Glad to hear it.

Hopefully as we redesign the embedding initialization APIs this kind of problem will become easier to solve (it's certainly one of my concerns with the current API).
History
Date User Action Args
2019-04-23 15:39:17steve.dowersetstatus: open -> closed
resolution: not a bug
messages: + msg340735

stage: resolved
2019-04-23 15:18:53rvqsetmessages: + msg340732
components: + Library (Lib), - Windows
2019-04-18 21:54:27rvqsetmessages: + msg340514
2019-04-18 15:30:37steve.dowersetnosy: + vstinner
messages: + msg340501
2019-04-18 15:20:25matrixisesetnosy: + paul.moore, tim.golden, zach.ware, steve.dower
components: + Windows, - Library (Lib)
2019-04-18 14:57:16rvqcreate