This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: imp.find_module reacts badly to iterator
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 2.7
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: Nosy List: Phillip.M.Feldman@gmail.com, brett.cannon, eric.snow
Priority: normal Keywords:

Created on 2018-08-16 21:51 by Phillip.M.Feldman@gmail.com, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (5)
msg323623 - (view) Author: Phillip M. Feldman (Phillip.M.Feldman@gmail.com) Date: 2018-08-16 21:51
`imp.find_module` goes down in flames if one tries to pass an iterator rather than a list of folders.  Firstly, the message that it produces is somewhat misleading:

   RuntimeError: sys.path must be a list of directory names

Secondly, it would be helpful if one could pass an iterator. I'm thinking in particular of the situation where one wants to import something from a large folder tree, and the module in question is likely to be found early in the search process, so that it is more efficient to explore the folder tree incrementally.
msg323660 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2018-08-17 16:05
There are several issues at hand here, Phillip.  I'll enumerate them below.

Thanks for taking the time to let us know about this.  However, I'm closing this issue since realistically the behavior of imp.find_module() isn't going to change, particularly in Python 2.7.  Even though the issue is closed, feel free to reply, particularly about how you are using imp.find_module() (we may be able to point you toward how to use importlib instead).

Also, I've changed this issue's type to "enhancement".  imp.find_module() is working as designed, so what you are looking for is a feature request.  Consequently there's a much higher bar for justifying a change.  Here are reasons why the requested change doesn't reach that bar:

1. Python 2.7 is closed to new features.

So imp.find_module() is not going to change.

2. Python 2.7 is nearing EOL.

We highly recommend that everyone move to Python 3 as soon as possible.  Hopefully you are in a position to do so.  If you're stuck on Python 2.7 then you miss the advantages of importlib, along with a ton of other benefits.

If you are not going to be able to migrate before 2020 then send an email to python-list@python.org asking for recommendations on what to do.

3. Starting in Python 3.4, using the imp module is discouraged/deprecated.

  "Deprecated since version 3.4: The imp package is pending deprecation in favor of importlib." [1]

The importlib package should have everything you need.  What are you using imp.find_module() for?  We should be able to demonstrate the equivalent using importlib.

4. The import machinery is designed around using a list (the builtin type, not the concept) for the "module search path".

* imp.find_module(): "the list of directory names given by sys.path is searched" [2]
* imp.find_module(): "Otherwise, path must be a list of directory names" [2]
* importlib.find_loader() (deprecated): "optionally within the specified path" (which defaults to sys.path) [3]
* importlib.util.find_spec(): doesn't even have a "path" parameter [4]
* ModuleSpec.submodule_search_locations: "List of strings for where to find submodules" [5]
* sys.path: "A list of strings that specifies the search path for modules. ... Only strings and bytes should be added to sys.path; all other data types are ignored during import." [6]


[1] https://docs.python.org/3/library/imp.html#module-imp
[2] https://docs.python.org/3/library/imp.html#imp.find_module
[3] https://docs.python.org/3/library/importlib.html#importlib.find_loader
[4] https://docs.python.org/3/library/importlib.html#importlib.util.find_spec
[5] https://docs.python.org/3/library/importlib.html#importlib.machinery.ModuleSpec.submodule_search_locations
[6] https://docs.python.org/3/library/sys.html#sys.path
msg323820 - (view) Author: Phillip M. Feldman (Phillip.M.Feldman@gmail.com) Date: 2018-08-21 04:51
It appears that the `importlib` package has the same issue: One can't
provide an iterator for the path.  When searching a large folder tree for
an item that is likely to be found early in the search process (i.e., at a
high level in the folder tree), the available functionality is massively
inefficient.  So, I wrote my own wrapper for `imp.find_module` to do this
job, and will eventually modify this code to use `importlib` instead of
`imp`.

On Fri, Aug 17, 2018 at 9:05 AM Eric Snow <report@bugs.python.org> wrote:

>
> Eric Snow <ericsnowcurrently@gmail.com> added the comment:
>
> There are several issues at hand here, Phillip.  I'll enumerate them below.
>
> Thanks for taking the time to let us know about this.  However, I'm
> closing this issue since realistically the behavior of imp.find_module()
> isn't going to change, particularly in Python 2.7.  Even though the issue
> is closed, feel free to reply, particularly about how you are using
> imp.find_module() (we may be able to point you toward how to use importlib
> instead).
>
> Also, I've changed this issue's type to "enhancement".  imp.find_module()
> is working as designed, so what you are looking for is a feature request.
> Consequently there's a much higher bar for justifying a change.  Here are
> reasons why the requested change doesn't reach that bar:
>
> 1. Python 2.7 is closed to new features.
>
> So imp.find_module() is not going to change.
>
> 2. Python 2.7 is nearing EOL.
>
> We highly recommend that everyone move to Python 3 as soon as possible.
> Hopefully you are in a position to do so.  If you're stuck on Python 2.7
> then you miss the advantages of importlib, along with a ton of other
> benefits.
>
> If you are not going to be able to migrate before 2020 then send an email
> to python-list@python.org asking for recommendations on what to do.
>
> 3. Starting in Python 3.4, using the imp module is discouraged/deprecated.
>
>   "Deprecated since version 3.4: The imp package is pending deprecation in
> favor of importlib." [1]
>
> The importlib package should have everything you need.  What are you using
> imp.find_module() for?  We should be able to demonstrate the equivalent
> using importlib.
>
> 4. The import machinery is designed around using a list (the builtin type,
> not the concept) for the "module search path".
>
> * imp.find_module(): "the list of directory names given by sys.path is
> searched" [2]
> * imp.find_module(): "Otherwise, path must be a list of directory names"
> [2]
> * importlib.find_loader() (deprecated): "optionally within the specified
> path" (which defaults to sys.path) [3]
> * importlib.util.find_spec(): doesn't even have a "path" parameter [4]
> * ModuleSpec.submodule_search_locations: "List of strings for where to
> find submodules" [5]
> * sys.path: "A list of strings that specifies the search path for modules.
> ... Only strings and bytes should be added to sys.path; all other data
> types are ignored during import." [6]
>
>
> [1] https://docs.python.org/3/library/imp.html#module-imp
> [2] https://docs.python.org/3/library/imp.html#imp.find_module
> [3] https://docs.python.org/3/library/importlib.html#importlib.find_loader
> [4]
> https://docs.python.org/3/library/importlib.html#importlib.util.find_spec
> [5]
> https://docs.python.org/3/library/importlib.html#importlib.machinery.ModuleSpec.submodule_search_locations
> [6] https://docs.python.org/3/library/sys.html#sys.path
>
> ----------
> nosy: +brett.cannon, eric.snow
> resolution:  -> wont fix
> stage:  -> resolved
> status: open -> closed
> type: behavior -> enhancement
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <https://bugs.python.org/issue34417>
> _______________________________________
>
msg323837 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2018-08-21 17:32
Saying "the available functionality is massively inefficient" is unnecessarily hostile towards those of us who actually wrote and maintain that code. Without diving into the code, chances are that requirement is there so that the C code can use macros to access the list as efficiently as possible.

Now if you want to propose specific changes to importlib's code for it to work with iterables instead of just lists then we would be happy to review the pull request.
msg323838 - (view) Author: Phillip M. Feldman (Phillip.M.Feldman@gmail.com) Date: 2018-08-21 18:37
My apologies for the tone of my remark.  I am grateful to you and others
who donate their time to develop the code.

I'm attaching the wrapper code that I created to work around the problem.

Phillip

def expander(paths='./*'):
   """
   OVERVIEW

   This function is a generator, i.e., creates an iterator that recursively
   searches a list of folders in an incremental fashion.  This approach is
   advantageous when the folder tree(s) to be searched are large and the
item of
   interest is likely to be found early in the process.

   INPUTS

   `paths` must be either (a) a list of folder paths (each of which is a
string)
   or (b) a single string containing one or more folder paths separated by
the
   OS-specific path delimiter.

   Each path in `paths` must be either (a) an existing folder or (b) an
existing
   folder followed by '/*' or '\*'.  In case (a), the folder string is
copied
   from the input (`paths`) to the output result verbatim.  In case (b), the
   folder string is replaced by an expanded list that includes not only the
   base (the portion of the path that remains after the '/*' or '\*' has
been
   removed), but all subfolders as well.

   RETURN VALUES

   The returned value is an iterator.

   Invoking the `next` method of the iterator produces one folder path at a
   time.
   """

   if isinstance(paths, basestring):
      paths= paths.split(os.pathsep)

   elif not isinstance(paths, list):
      raise TypeError("`paths` must be either a string or a list of
strings.")

   found= set()

   for path in paths:
      if path.endswith('/*') or path.endswith('\*'):

         # A recursive search of subfolders is required:
         for item in os.walk(path[:-2]):
            base= os.path.abspath(item[0])
            new= [os.path.join(base, nested) for nested in item[1]]

            for item in new:
               if not item in found:
                  found.add(item)
                  yield item

      else:

         # No recursive search is required:
         if not item in found:
            found.add(item)
            yield item

   # end for path in paths

def find_module(module_name, in_folders=[]):
   """
   This function finds a module and return the fully-qualified file name.
   Folders from `in_folders`, if specified, are search first, followed by
   folders in the global `import_path` list.

   If any folder name in `in_folders` or `import_path` ends with an
asterisk,
   indicating that a recursive search is required, `files.expander` is
   invoked to create iterators that return one folder at a time, and
   `imp.find_module` is invoked separately for each of these folders.

   EXPLICIT INPUTS

   `module_name` is the unqualified name of the module to be found.

   `in_folders` is an optional list of additional folders to be searched
before
   the folders in `import_path` are searched.

   IMPLICIT INPUTS

   `import_path` is obtained from the global namespace.

   RETURN VALUES

   If `find_module` is able to find the requested module, it returns the
same
   three return values (`f`, `filename`, and `description`) that
   `imp.find_module` would return.
   """

   if isinstance(in_folders, basestring):
      in_folders= [in_folders]
   elif not isinstance(in_folders, list):
      raise TypeError("If specified, `in_folders` must be either a string
or a "
        "list of strings.  (A string is wrapped to produce a length-1
list).")

   if any([item.endswith('*') for item in in_folders ]) or \
      any([item.endswith('*') for item in import_path]):

      ex= None

      for folder in itertools.chain(
        expander(in_folders), expander(import_path)):
         try:
            return imp.find_module(module_name, in_folders + import_path)
         except Exception as ex:
            pass

      if ex:
         raise ex

   else:
      return imp.find_module(module_name, in_folders + import_path)

On Tue, Aug 21, 2018 at 10:32 AM Brett Cannon <report@bugs.python.org>
wrote:

>
> Brett Cannon <brett@python.org> added the comment:
>
> Saying "the available functionality is massively inefficient" is
> unnecessarily hostile towards those of us who actually wrote and maintain
> that code. Without diving into the code, chances are that requirement is
> there so that the C code can use macros to access the list as efficiently
> as possible.
>
> Now if you want to propose specific changes to importlib's code for it to
> work with iterables instead of just lists then we would be happy to review
> the pull request.
>
> ----------
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <https://bugs.python.org/issue34417>
> _______________________________________
>
History
Date User Action Args
2022-04-11 14:59:04adminsetgithub: 78598
2018-08-21 18:37:08Phillip.M.Feldman@gmail.comsetmessages: + msg323838
2018-08-21 17:32:32brett.cannonsetmessages: + msg323837
2018-08-21 04:51:38Phillip.M.Feldman@gmail.comsetmessages: + msg323820
2018-08-17 16:05:35eric.snowsetstatus: open -> closed

type: behavior -> enhancement

nosy: + eric.snow, brett.cannon
messages: + msg323660
resolution: wont fix
stage: resolved
2018-08-16 21:51:44Phillip.M.Feldman@gmail.comcreate