classification
Title: importlib.readers.MultiplexedPath
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.10
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: FFY00, daveraja, eric.smith, jaraco
Priority: normal Keywords:

Created on 2021-10-11 06:06 by daveraja, last changed 2021-10-19 03:19 by daveraja. This issue is now closed.

Files
File name Uploaded Description Edit
navigate.py daveraja, 2021-10-11 10:03
Messages (10)
msg403621 - (view) Author: David Rajaratnam (daveraja) Date: 2021-10-11 06:06
I'm trying to use `importlib.resources.files()`. However, I cannot work out how to properly use the `importlib.readers.MultiplexedPath()` object that is returned.

As I expect and want, the returned object is referring to a directory, but I cannot seem to simply access the value of that path. 

For a normal `pathlib.Path` object you can get a OS specific path by simply converting to the string representation (eg., 'str(pathlib.Path('/somepath') == '/somepath'). However, for the MutiplexedPath object the __str__() value is the same as the __repr__() (e.g., "MultiplexedPath('/somepath')").

It seems that this is a bug since I would expect MultiplexedPath to behave the same as pathlib.Path in this regard. In the meantime is there a way to actually access this data without stripping the prefix and suffix of this string?
msg403630 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2021-10-11 08:04
Can you provide a short code snippet that we can run that demonstrates the problem?

Looking at the code, and not knowing much about it, maybe iterating over the paths with .iterdir() is what you want?
msg403645 - (view) Author: David Rajaratnam (daveraja) Date: 2021-10-11 10:03
Thanks for the quick response. I think the attached file shows the issue. 

In the directory where you download and run this file create a sub-directory 'data'. Then running the file creates the output (note: I've truncated the path name):

> Traverse data: MultiplexedPath('<<abspath-deleted>>/data') (<class 'importlib.readers.MultiplexedPath'>)

I think the idea behind MultiplexedPath() is that it merges together multiple base/root directories so even though in this case it is a single path it wouldn't necessarily be the case in general. So while it makes sense that for some MultiplexedPath object X that str(X) isn't itself a proper directory path, however, there seems to be no method/property to access these root paths.

Note: Traverable.iterdir() iterates over the files/sub-directories in the root(s) so doesn't return the root path(s) themselves.

A further note. If you add a file `data/__init__.py` then data is now a package and running the code this time returns a PosixPath object (on a posix system): 

> Traverse data: <<abspath-deleted>>/data (<class 'pathlib.PosixPath'>)
> X: <<abspath-deleted>>/data/__init__.py (<class 'pathlib.PosixPath'>)
> X: <<abspath-deleted>>/data/__pycache__ (<class 'pathlib.PosixPath'>)
msg404042 - (view) Author: Filipe Laíns (FFY00) * (Python triager) Date: 2021-10-15 19:01
The Traversable protocol does not guarantee you access to the file-system path. pathlib.Path happens to give you that information, but other traversables are not required to.

The main reasoning for this is that traversables do not need to exist on the file-system, we can be reading from a zip, database, etc.

str(path) does give you the file-system path on pathlib.Path, but there is no guarantee about the value of __str__ on other traversables.
My recommendation here would be to use os.fspath[1] instead if you want to try getting the file-system path from traversables.

I don't really know what you are trying to accomplish, but I would recommend that you try designing your code directly on top of the Traversable interface, which should make it work on anything we return in importlib.resources.files.
If you actually need a file-system path, to pass to an external program or something like that, you can use the importlib.resources.as_file[2] helper.

bpo-44200 proposes documenting that traversables should implement __fspath__ if they represent a file-system path, which could help a bit with your issue.

[1] https://docs.python.org/3/library/os.html#os.fspath
[2] https://docs.python.org/3/library/importlib.html#importlib.resources.as_file
msg404043 - (view) Author: Filipe Laíns (FFY00) * (Python triager) Date: 2021-10-15 19:04
Just to clarify, as I realize I did point this out in my reply, Traversable[1][2] is the protocol that objects returned by importlib.resources.files implement.

[1] https://docs.python.org/3/library/importlib.html#importlib.abc.Traversable
[2] https://github.com/python/cpython/blob/00ffc4513df7b89a168e88da4d1e3ac367f7682f/Lib/importlib/abc.py#L355
msg404044 - (view) Author: Filipe Laíns (FFY00) * (Python triager) Date: 2021-10-15 19:09
*realize I did *not* point this out

:facepalm: sorry!
msg404154 - (view) Author: David Rajaratnam (daveraja) Date: 2021-10-18 05:03
Hi Filipe,

Thanks very much for the pointers and for the clarifications. I'll look at using importlib.resources.as_file(). I think this is the API that I stupidly seemed to have missed!

However, it is also very possible that I am misunderstanding the correct usage of the importlib.resource library, so here is a summary of my use-case:

I am working with a specialised language interpreter that can be embedded in python. The interpreter API requires a file system path to load files and the language itself has its own "include" statements for loading files. So in my case it has to be a file system path and not some other resource (eg. zip file or database).

However, I have struggled to understand what is the correct way to treat these files when installed as part of a python package. It seems to me that python's setuptools is too limited to cover the range of options that I would want. AFAIK setuptools allows only two options for installing non-python files; "data_files" and "package_data". "data_files" doesn't seem to be the right place because I couldn't find a full-proof way to programmatically find out where these files are installed. So it seems to be focused more on supplementary data (high-level docs, examples, etc) rather than data files that are necessary for the operations of the application.

On the other hand "package_data" forces these non-python files to be embedded within the python package structure. This is a bit ugly since its not really a natural fit; for example the language has its own command-line tools that I use during development.

So what I've tried to do is that for development I separate the python code from my other interpreter's code, but then for installation have setup.py map the specialised language files into the python package structure. I'm not overly happy with how I've done it (although it does seem to work),so I would be very happy if someone can point to a better way.

Dave
msg404156 - (view) Author: David Rajaratnam (daveraja) Date: 2021-10-18 05:45
I'm closing the bug report. Clearly not a bug. It looks like importlib.resources.as_file() is exactly what I want. It returns a context and can potentially create a temporary file system directory structure with all files I want underneath. Not sure how I missed this before and was struggling to work out what to do with a MultiplexedPath object.

If you do have comments on a better way of separating python and non-python code (see my previous use-case explanation) I'm interested to hear it.

Regards,
Dave
msg404198 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2021-10-18 16:51
Thanks Dave for closing. I would recommend in the future if you have packaging questions to bring them to packaging problems repo as indicated at https://packaging.python.org/support/#how-to-get-support.

Glad to hear that `as_file` promised to do what you need. Do be aware that it doesn't yet support a directory of files (only individual files), a known deficiency (https://github.com/python/importlib_resources/issues/228).

I don't have any good advice on separating Python and non-Python code in your package. You're right that the current interfaces for supporting package resources are specifically designed around resources in a Python package (aka package_data).

I agree that there may not be a robust way to locate "data_files". It sounds like you have a use-case that's not well served by the current implementation. I'd recommend to file a report describing a detailed minimal example of the use-case you have and what you'd like to see (in packaging-problems; maybe search if someone's already reported). One thing you'll want to answer is where do you expect these files to be installed if not in the python package?

Thanks and good luck!
msg404266 - (view) Author: David Rajaratnam (daveraja) Date: 2021-10-19 03:19
Hi Jason,

Thanks for the extra pointers.  My initial intention in explaining my use-case was to find out whether treating an externally embedded  interpreter's files as `importlib.resources` is the correct use of this library. However, you're right that my explanation turned into a python packaging support question. I'm sorry about that.

Thanks for the clarification about the limitations of `as_file()`. I guess that means that at the moment it doesn't fully support my use-case, but hopefully may do so at some point in the future.

Regards,
Dave
History
Date User Action Args
2021-10-19 03:19:24daverajasetmessages: + msg404266
2021-10-18 16:51:03jaracosetmessages: + msg404198
2021-10-18 05:45:40daverajasetstatus: open -> closed
resolution: not a bug
messages: + msg404156

stage: resolved
2021-10-18 05:03:38daverajasetmessages: + msg404154
2021-10-15 19:09:55FFY00setmessages: + msg404044
2021-10-15 19:04:40FFY00setmessages: + msg404043
2021-10-15 19:01:09FFY00setnosy: + jaraco, FFY00
messages: + msg404042
2021-10-11 10:03:23daverajasetfiles: + navigate.py

messages: + msg403645
2021-10-11 08:04:09eric.smithsetnosy: + eric.smith
messages: + msg403630
2021-10-11 06:06:02daverajacreate