classification
Title: isinstance is called a more times that needed in ntpath
Type: performance Stage: resolved
Components: Windows Versions: Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: BreamoreBoy, brian.curtin, ezio.melotti, haypo, jcea, mandel, python-dev, serhiy.storchaka, terry.reedy
Priority: normal Keywords: patch

Created on 2012-07-07 13:38 by mandel, last changed 2014-07-23 17:46 by serhiy.storchaka. This issue is now closed.

Files
File name Uploaded Description Edit
less_isinstance.patch mandel, 2012-07-07 13:38 Patch that reduces the number of isinstance calls performed. review
ntpath_cleanup.diff serhiy.storchaka, 2014-07-22 10:09 review
Repositories containing patches
http://bitbucket.org/mandel/ntpath-performance
Messages (10)
msg164842 - (view) Author: Manuel de la Pena (mandel) Date: 2012-07-07 13:38
The problem is simple, the code that allows to use binary strings and unicode is making more calls that needed to isinstance(path, bytes) since the result of the code is not shared. For example, the following calls are present in the module:

def _get_empty(path):
    if isinstance(path, bytes):
        return b'' 
    else:
        return ''

def _get_sep(path):
    if isinstance(path, bytes):
        return b'\\'
    else:
        return '\\'

def _get_altsep(path):
    if isinstance(path, bytes):
        return b'/'
    else:
        return '/'

def _get_bothseps(path):
    if isinstance(path, bytes):
        return b'\\/'
    else:
        return '\\/'

def _get_dot(path):
    if isinstance(path, bytes):
        return b'.'
    else:
        return '.'

...

And then something similar to the following is found in the code:

def normpath(path):
    """Normalize path, eliminating double slashes, etc."""
    sep = _get_sep(path)
    dotdot = _get_dot(path) * 2
    special_prefixes = _get_special(path)
    if path.startswith(special_prefixes):
        # in the case of paths with these prefixes:
        # \\.\ -> device names
        # \\?\ -> literal paths
        # do not do any normalization, but return the path unchanged
        return path
    path = path.replace(_get_altsep(path), sep)
    prefix, path = splitdrive(path)

As you can see the isinstance call is performed more than needed which certainly affects the performance of the path operations. 

The attached patch removes the number of calls to isinstance(obj, bytes) and also ensures that the function that returns the correct literal is as fast as possible by using a dict.
msg165409 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2012-07-13 19:25
Have you tried doing some benchmarks before and after the patch?
If this patch is applied I think it would be good to change posixpath too.
Also make sure that the changes you made are covered by the tests.
msg166267 - (view) Author: Manuel de la Pena (mandel) Date: 2012-07-24 09:29
Tests indeed cover the changes made. I don't know about a decent way of doing benchmarks for the changes. Any recommendation?

> If this patch is applied I think it would be good to change posixpath too.

I agree and I'd love to do it but in a diff bug to make things self-contained, what do you think?
msg166390 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2012-07-25 12:33
> I don't know about a decent way of doing benchmarks for the changes.
> Any recommendation?

You could make a script that uses the timeit module.

>> If this patch is applied I think it would be good to change
>> posixpath too.
> I agree and I'd love to do it but in a diff bug to make things
> self-contained, what do you think?

Having a single patch that fixes both is OK.
msg220033 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2014-06-08 13:14
@Manuel do you intend picking this up?
msg223656 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-07-22 10:09
Here is alternative patch. I believe it makes a code simpler.

Microbenchmarks:

$ ./python -m timeit -n 100000 -s "from ntpath import splitdrive"  "splitdrive('c:foo')"

Before: 100000 loops, best of 3: 20 usec per loop
After: 100000 loops, best of 3: 11.5 usec per loop

$ ./python -m timeit -n 100000 -s "from ntpath import splitext"  "splitext('python.exe')"

Before: 100000 loops, best of 3: 23.6 usec per loop
After: 100000 loops, best of 3: 18 usec per loop

$ ./python -m timeit -s "from ntpath import join"  "join('foo', 'bar')"

Before: 10000 loops, best of 3: 50.9 usec per loop
After: 10000 loops, best of 3: 32.3 usec per loop

$ ./python -m timeit -s "from ntpath import normpath"  "normpath('/foo/bar/baz')"

Before: 10000 loops, best of 3: 67.5 usec per loop
After: 10000 loops, best of 3: 40.3 usec per loop

$ ./python -m timeit -s "from ntpath import relpath"  "relpath('foo', 'bar')"

Before: 1000 loops, best of 3: 695 usec per loop
After: 1000 loops, best of 3: 456 usec per loop
msg223657 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-07-22 10:12
I like ntpath_cleanup.diff, I don't think that it makes the code worse.

FYI os.fsencode() accepts str too, you can simplify:

     if isinstance(path, bytes):
-        userhome = userhome.encode(sys.getfilesystemencoding())
+        userhome = os.fsencode(userhome)

to


+    userhome = os.fsencode(userhome)
msg223672 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-07-22 16:36
No, if *path* is not bytes, *userhome* shouldn't be converted to bytes.
msg223698 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-07-22 21:15
Oh you're right sorry.
msg223752 - (view) Author: Roundup Robot (python-dev) Date: 2014-07-23 17:43
New changeset b22aaa59d24f by Serhiy Storchaka in branch 'default':
Issue #15275: Clean up and speed up the ntpath module.
http://hg.python.org/cpython/rev/b22aaa59d24f
History
Date User Action Args
2014-07-23 17:46:54serhiy.storchakasetstatus: open -> closed
assignee: serhiy.storchaka
resolution: fixed
stage: patch review -> resolved
2014-07-23 17:43:24python-devsetnosy: + python-dev
messages: + msg223752
2014-07-22 21:15:58hayposetmessages: + msg223698
2014-07-22 16:36:49serhiy.storchakasetmessages: + msg223672
2014-07-22 10:12:08hayposetnosy: + haypo
messages: + msg223657
2014-07-22 10:09:20serhiy.storchakasetfiles: + ntpath_cleanup.diff
versions: + Python 3.5, - Python 3.4
nosy: + serhiy.storchaka

messages: + msg223656
2014-06-08 13:14:03BreamoreBoysetnosy: + BreamoreBoy
messages: + msg220033
2012-07-25 12:33:59ezio.melottisetmessages: + msg166390
2012-07-24 09:29:31mandelsetmessages: + msg166267
2012-07-13 23:40:11terry.reedysetnosy: + terry.reedy
2012-07-13 19:25:43ezio.melottisetversions: + Python 3.4, - Python 3.3
nosy: + ezio.melotti

messages: + msg165409

type: performance
stage: patch review
2012-07-08 09:59:58mandelsetnosy: + brian.curtin
2012-07-08 02:54:52jceasetnosy: + jcea
2012-07-07 13:41:22mandelsetfiles: - f5c57ba1124b.diff
2012-07-07 13:40:14mandelsetfiles: + f5c57ba1124b.diff
2012-07-07 13:38:18mandelcreate