Message 188432 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	benhoyt
Recipients	Trundle, abacabadabacaba, benhoyt, brian.curtin, christian.heimes, eric.araujo, giampaolo.rodola, gregory.p.smith, loewis, ncoghlan, neologix, nvetoshkin, pitrou, rhettinger, serhiy.storchaka, socketpair, terry.reedy, tim.golden, torsten, twouters, vstinner
Date	2013-05-05.08:53:16
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1367743996.81.0.0633756411816.issue11406@psf.upfronthosting.co.za>
In-reply-to

Content
> I find iterdir_stat() ugly :-) I like the scandir name, which has some precedent with POSIX. Fair enough. I'm cool with scandir(). > scandir() cannot return (name, stat), because on POSIX, readdir() only returns d_name and d_type (the type of the entry): to return a stat, we would have to call stat() on each entry, which would defeat the performance gain. Yes, you're right. I "solved" this in BetterWalk with the solution you propose of returning a stat_result object with the fields it could get "for free" set, and the others set to None. So on Linux, you'd get a stat_result with only st_mode set (or None for DT_UNKNOWN), and all the other fields None. However -- st_mode is the one you're most likely to use, usually looking just for whether it's a file or directory. So calling code would look something like this: files = [] dirs = [] for name, st in scandir(path): if st.st_mode is None: st = os.stat(os.path.join(path, name)) if stat.S_ISDIR(st.st_mode): dirs.append(name) else: files.append(name) Meaning you'd get the speed improvements 99% of the time (when st_mode) was set, but if st_mode is None, you can call stat and handle errors and whatnot yourself. > That's why scandir would be a rather low-level call, whose main user would be walkdir, which only needs to know the entry time and not the whole stat result. Agreed. This is in the OS module after all, and there's tons of stuff that's OS-dependent in there. However, I think that doing something like the above, we can make it usable and performant on both Linux and Windows for use cases like walking directory trees. > Also, I don't know which information is returned by the readdir equivalent on Windows, but if we want a consistent API, we have to somehow map d_type and Windows's returned type to a common type, like DT_FILE, DT_DIRECTORY, etc (which could be an enum). The Windows scan directory functions (FindFirstFile/FindNextFile) return a full stat (or at least, as much info as you get from a stat in Windows). We could map them to a common type -- but I'm suggesting that common type might as well be "stat_result with None meaning not present". That way users don't have to learn a completely new type. > The other approach would be to return a dummy stat object with only st_mode set, but that would be kind of a hack to return a dummy stat result with only part of the attributes set (some people will get bitten by this). We could document any platform-specific stuff, and places you'd users could get bitten. But can you give me an example of where the stat_result-with-st_mode-or-None approach falls over completely?

> I find iterdir_stat() ugly :-) I like the scandir name, which has some precedent with POSIX.

Fair enough. I'm cool with scandir().

> scandir() cannot return (name, stat), because on POSIX, readdir() only returns d_name and d_type (the type of the entry): to return a stat, we would have to call stat() on each entry, which would defeat the performance gain.

Yes, you're right. I "solved" this in BetterWalk with the solution you propose of returning a stat_result object with the fields it could get "for free" set, and the others set to None.

So on Linux, you'd get a stat_result with only st_mode set (or None for DT_UNKNOWN), and all the other fields None. However -- st_mode is the one you're most likely to use, usually looking just for whether it's a file or directory. So calling code would look something like this:

files = []
dirs = []
for name, st in scandir(path):
    if st.st_mode is None:
        st = os.stat(os.path.join(path, name))
    if stat.S_ISDIR(st.st_mode):
        dirs.append(name)
    else:
        files.append(name)

Meaning you'd get the speed improvements 99% of the time (when st_mode) was set, but if st_mode is None, you can call stat and handle errors and whatnot yourself.

> That's why scandir would be a rather low-level call, whose main user would be walkdir, which only needs to know the entry time and not the whole stat result.

Agreed. This is in the OS module after all, and there's tons of stuff that's OS-dependent in there. However, I think that doing something like the above, we can make it usable and performant on both Linux and Windows for use cases like walking directory trees.

> Also, I don't know which information is returned by the readdir equivalent on Windows, but if we want a consistent API, we have to somehow map d_type and Windows's returned type to a common type, like DT_FILE, DT_DIRECTORY, etc (which could be an enum).

The Windows scan directory functions (FindFirstFile/FindNextFile) return a *full* stat (or at least, as much info as you get from a stat in Windows). We *could* map them to a common type -- but I'm suggesting that common type might as well be "stat_result with None meaning not present". That way users don't have to learn a completely new type.

> The other approach would be to return a dummy stat object with only st_mode set, but that would be kind of a hack to return a dummy stat result with only part of the attributes set (some people will get bitten by this).

We could document any platform-specific stuff, and places you'd users could get bitten. But can you give me an example of where the stat_result-with-st_mode-or-None approach falls over completely?

History
Date	User	Action	Args
2013-05-05 08:53:16	benhoyt	set	recipients: + benhoyt, loewis, twouters, rhettinger, terry.reedy, gregory.p.smith, ncoghlan, pitrou, vstinner, giampaolo.rodola, christian.heimes, tim.golden, eric.araujo, Trundle, brian.curtin, torsten, nvetoshkin, neologix, abacabadabacaba, socketpair, serhiy.storchaka
2013-05-05 08:53:16	benhoyt	set	messageid: <1367743996.81.0.0633756411816.issue11406@psf.upfronthosting.co.za>
2013-05-05 08:53:16	benhoyt	link	issue11406 messages
2013-05-05 08:53:16	benhoyt	create