This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author gregory.p.smith
Recipients Shai, brett.cannon, gregory.p.smith, hongweipeng, pitrou, serhiy.storchaka
Date 2019-09-10.23:03:27
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1568156607.65.0.470224335264.issue37935@roundup.psfhosted.org>
In-reply-to
Content
For some context here, the pathlib scandir code was written by Serhiy in https://bugs.python.org/issue26032 and related commit https://github.com/python/cpython/commit/680cb152c5d220a74321fa905d4fc91bdec40fbb.

> Any optimization can be accepted only when we have any prove that the change actually has measurable effect, and that it is large enough.

While often good advice, there is no such blanket policy.  Saying that comes across as dismissive.  A better way to phrase such a statement is as a question:

In what practical scenarios does this change provide a demonstrable benefit?

The improvement in this case could be avoiding an explosion of memory use in some circumstances.  Depending on the particular filesystem.  (the original reason based on real world production experience that I pushed to have our os.scandir API created in the first place)

> Avoiding calling list() on the output of scandir() may be not harmless.

Keeping it may not be harmless either. :)

One key point of os.scandir() is to be iterated over instead of turned into a list to avoid using too much memory.  Code that requires a list could've just called listdir() (as the previous code did in both of these places) - if that is intentional, we should include an explicit comment stating why a preloaded list was intentional.

Either these list calls go away as is natural with scandir, or comments should be added explaining why they are important (related: unittests exercising the situations they are important for would be ideal)

As you're the original author of the pathlib scandir code, do you remember what the intent of these list(scandir) calls was?

One potential difference without them: If someone is selecting via pathlib and modifying directory file contents while iterating over the results, the lists may be important.  (what happens in this case presumably depends on the underlying os fs implementation)

It sounds like we don't have test cases covering such scenarios.
History
Date User Action Args
2019-09-10 23:03:27gregory.p.smithsetrecipients: + gregory.p.smith, brett.cannon, pitrou, serhiy.storchaka, hongweipeng, Shai
2019-09-10 23:03:27gregory.p.smithsetmessageid: <1568156607.65.0.470224335264.issue37935@roundup.psfhosted.org>
2019-09-10 23:03:27gregory.p.smithlinkissue37935 messages
2019-09-10 23:03:27gregory.p.smithcreate