Background:
I have a data hierarchy with a lot of "sibling" symlinked directories/files. I want to glob only the non-symlink files, because it's a *huge* performance increase.
Before `os.scandir`, I was using a local copy of `glob.py` and calling `os.path.islink` every time, which was faster for *my* use case, but unacceptable for upstreaming. With `os.scandir`, my new patch should be acceptable.
The patch includes tests.
Current discussion points:
* Am I making the right decision to still accept symlinks for fully-literal components (in glob0)? It doesn't apply to my use-case, and I imagine some people might want to handle that case separately.
* Are my tests sufficient? I just copied and modified the existing symlink tests.
* Should my `flags` TODO be implemented *before* this patch? IMO it would be clearer after, even if it makes the diffs longer.
Future discussion points (don't derail):
* Should my `flags` TODO be implemented internally (this would significantly shrink any future patches)? (I can work on this)
* Should `flags` also be exposed externally?
* What additional `flags` might be useful? (my list: GLOB_ERR, GLOB_MARK, ~GLOB_NOSORT, ~GLOB_NOESCAPE, GLOB_PERIOD, GLOB_BRACE, GLOB_TILDE_CHECK, GLOB_ONLYDIR (+ equivalent for files - also, why doesn't `os.scandir` have accessors for the other types without doing an unnecessary stat?))
* Is there a bitwise enum (or equivalently, enum set) in the standard library so `flags` can get sane reprs? (I've implemented this before, but imagine it would be overwhelmed with bikeshedding if it doesn't exist yet)
* Should `pathlib` really be implementing globbing on its own? That makes it hard to ensure feature parity. Perhaps the `glob` module needs some additional APIs? (I don't want to work on `pathlib` itself)
|