This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author ssbarnea
Recipients dstufft, eric.araujo, ssbarnea
Date 2021-06-23.09:20:31
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1624440031.78.0.268205320364.issue44497@roundup.psfhosted.org>
In-reply-to
Content
As the results of investigating a very poor performance of pip while trying to install some code I was able to identify that the root cause was the current implementation of distutils.filelist.findall or to be more precise the _find_all_simple function, which does followsymlinks but without any measures for preventing recursivity and duplicates.

To give an idea in my case it was taking 5-10minutes to run while the CPU was at 100%, for a repository with 95k files (most of them temp inside .tox folders). Removal of the symlinks did make it run in ~5s.

IMHO, _find_all_simple should normalize paths and avoid returning any duplicates.


Realted: https://bugs.launchpad.net/pbr/+bug/1933311
History
Date User Action Args
2021-06-23 09:20:31ssbarneasetrecipients: + ssbarnea, eric.araujo, dstufft
2021-06-23 09:20:31ssbarneasetmessageid: <1624440031.78.0.268205320364.issue44497@roundup.psfhosted.org>
2021-06-23 09:20:31ssbarnealinkissue44497 messages
2021-06-23 09:20:31ssbarneacreate