New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
isinstance is called a more times that needed in ntpath #59480
Comments
The problem is simple, the code that allows to use binary strings and unicode is making more calls that needed to isinstance(path, bytes) since the result of the code is not shared. For example, the following calls are present in the module: def _get_empty(path):
if isinstance(path, bytes):
return b''
else:
return ''
def _get_sep(path):
if isinstance(path, bytes):
return b'\\'
else:
return '\\'
def _get_altsep(path):
if isinstance(path, bytes):
return b'/'
else:
return '/'
def _get_bothseps(path):
if isinstance(path, bytes):
return b'\\/'
else:
return '\\/'
def _get_dot(path):
if isinstance(path, bytes):
return b'.'
else:
return '.' ... And then something similar to the following is found in the code: def normpath(path):
"""Normalize path, eliminating double slashes, etc."""
sep = _get_sep(path)
dotdot = _get_dot(path) * 2
special_prefixes = _get_special(path)
if path.startswith(special_prefixes):
# in the case of paths with these prefixes:
# \\.\ -> device names
# \\?\ -> literal paths
# do not do any normalization, but return the path unchanged
return path
path = path.replace(_get_altsep(path), sep)
prefix, path = splitdrive(path) As you can see the isinstance call is performed more than needed which certainly affects the performance of the path operations. The attached patch removes the number of calls to isinstance(obj, bytes) and also ensures that the function that returns the correct literal is as fast as possible by using a dict. |
Have you tried doing some benchmarks before and after the patch? |
Tests indeed cover the changes made. I don't know about a decent way of doing benchmarks for the changes. Any recommendation?
I agree and I'd love to do it but in a diff bug to make things self-contained, what do you think? |
You could make a script that uses the timeit module.
Having a single patch that fixes both is OK. |
@manuel do you intend picking this up? |
Here is alternative patch. I believe it makes a code simpler. Microbenchmarks: $ ./python -m timeit -n 100000 -s "from ntpath import splitdrive" "splitdrive('c:foo')" Before: 100000 loops, best of 3: 20 usec per loop $ ./python -m timeit -n 100000 -s "from ntpath import splitext" "splitext('python.exe')" Before: 100000 loops, best of 3: 23.6 usec per loop $ ./python -m timeit -s "from ntpath import join" "join('foo', 'bar')" Before: 10000 loops, best of 3: 50.9 usec per loop $ ./python -m timeit -s "from ntpath import normpath" "normpath('/foo/bar/baz')" Before: 10000 loops, best of 3: 67.5 usec per loop $ ./python -m timeit -s "from ntpath import relpath" "relpath('foo', 'bar')" Before: 1000 loops, best of 3: 695 usec per loop |
I like ntpath_cleanup.diff, I don't think that it makes the code worse. FYI os.fsencode() accepts str too, you can simplify: if isinstance(path, bytes):
- userhome = userhome.encode(sys.getfilesystemencoding())
+ userhome = os.fsencode(userhome) to + userhome = os.fsencode(userhome) |
No, if *path* is not bytes, *userhome* shouldn't be converted to bytes. |
Oh you're right sorry. |
New changeset b22aaa59d24f by Serhiy Storchaka in branch 'default': |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: