Message257651
There are concerns that pathlib is inefficient because it doesn't cache stat() operations. Thus, for example this code calls stat() for each result twice (once internal to the glob, a second time to answer the is_symlink() question):
p = pathlib.Path('/usr')
links = [x for x in p.rglob('*') if x.is_symlink()]
I have a tentative patch (without tests). On my Mac it only gives modest speedups (between 5 and 20 percent) but things may be different on other platforms or for applications that make a lot of inquiries about the same path.
The API I am proposing is that by default nothing changes; to benefit from caching you must instantiate a StatCache() object and pass it to Path() constructor calls, e.g. Path('/usr', stat_cache=cache_object). All Path objects derived from this path object will share the cache. To force an uncached Path object you can use Path(p).
The patch is incomplete; there are no tests for the new functionality (though existing tests pass) and __eq__ should be adjusted so that Path objects using different caches always compare unequal.
Question for Antoine: Did you perhaps anticipate a design like this? Each Path instance has an _accessor slot, but there is only one accessor instance defined that is used everywhere (the global _normal_accessor). So you could have avoided a bunch of complexity in the code around setting the proper _accessor unless you were planning to use multiple accessors. |
|
Date |
User |
Action |
Args |
2016-01-06 22:41:29 | gvanrossum | set | recipients:
+ gvanrossum, pitrou |
2016-01-06 22:41:29 | gvanrossum | set | messageid: <1452120089.84.0.0792910513701.issue26031@psf.upfronthosting.co.za> |
2016-01-06 22:41:29 | gvanrossum | link | issue26031 messages |
2016-01-06 22:41:29 | gvanrossum | create | |
|