isinstance is called a more times that needed in ntpath #59480

mandel · 2012-07-07T13:38:18Z

BPO	15275
Nosy	@terryjreedy, @jcea, @vstinner, @ezio-melotti, @briancurtin, @serhiy-storchaka
Files	less_isinstance.patch: Patch that reduces the number of isinstance calls performed. ntpath_cleanup.diff

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/serhiy-storchaka'
closed_at = <Date 2014-07-23.17:46:54.258>
created_at = <Date 2012-07-07.13:38:18.190>
labels = ['OS-windows', 'performance']
title = 'isinstance is called a more times that needed in ntpath'
updated_at = <Date 2014-07-23.17:46:54.257>
user = 'https://bugs.python.org/mandel'

bugs.python.org fields:

activity = <Date 2014-07-23.17:46:54.257>
actor = 'serhiy.storchaka'
assignee = 'serhiy.storchaka'
closed = True
closed_date = <Date 2014-07-23.17:46:54.258>
closer = 'serhiy.storchaka'
components = ['Windows']
creation = <Date 2012-07-07.13:38:18.190>
creator = 'mandel'
dependencies = []
files = ['26294', '36030']
hgrepos = ['140']
issue_num = 15275
keywords = ['patch']
message_count = 10.0
messages = ['164842', '165409', '166267', '166390', '220033', '223656', '223657', '223672', '223698', '223752']
nosy_count = 9.0
nosy_names = ['terry.reedy', 'jcea', 'vstinner', 'ezio.melotti', 'brian.curtin', 'BreamoreBoy', 'python-dev', 'mandel', 'serhiy.storchaka']
pr_nums = []
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'performance'
url = 'https://bugs.python.org/issue15275'
versions = ['Python 3.5']

mandel · 2012-07-07T13:38:17Z

The problem is simple, the code that allows to use binary strings and unicode is making more calls that needed to isinstance(path, bytes) since the result of the code is not shared. For example, the following calls are present in the module:

def _get_empty(path):
    if isinstance(path, bytes):
        return b'' 
    else:
        return ''

def _get_sep(path):
    if isinstance(path, bytes):
        return b'\\'
    else:
        return '\\'

def _get_altsep(path):
    if isinstance(path, bytes):
        return b'/'
    else:
        return '/'

def _get_bothseps(path):
    if isinstance(path, bytes):
        return b'\\/'
    else:
        return '\\/'

def _get_dot(path):
    if isinstance(path, bytes):
        return b'.'
    else:
        return '.'

...

And then something similar to the following is found in the code:

def normpath(path):
    """Normalize path, eliminating double slashes, etc."""
    sep = _get_sep(path)
    dotdot = _get_dot(path) * 2
    special_prefixes = _get_special(path)
    if path.startswith(special_prefixes):
        # in the case of paths with these prefixes:
        # \\.\ -> device names
        # \\?\ -> literal paths
        # do not do any normalization, but return the path unchanged
        return path
    path = path.replace(_get_altsep(path), sep)
    prefix, path = splitdrive(path)

As you can see the isinstance call is performed more than needed which certainly affects the performance of the path operations.

The attached patch removes the number of calls to isinstance(obj, bytes) and also ensures that the function that returns the correct literal is as fast as possible by using a dict.

ezio-melotti · 2012-07-13T19:25:44Z

Have you tried doing some benchmarks before and after the patch?
If this patch is applied I think it would be good to change posixpath too.
Also make sure that the changes you made are covered by the tests.

mandel · 2012-07-24T09:29:31Z

Tests indeed cover the changes made. I don't know about a decent way of doing benchmarks for the changes. Any recommendation?

If this patch is applied I think it would be good to change posixpath too.

I agree and I'd love to do it but in a diff bug to make things self-contained, what do you think?

ezio-melotti · 2012-07-25T12:33:59Z

I don't know about a decent way of doing benchmarks for the changes.
Any recommendation?

You could make a script that uses the timeit module.

> If this patch is applied I think it would be good to change
> posixpath too.
I agree and I'd love to do it but in a diff bug to make things
self-contained, what do you think?

Having a single patch that fixes both is OK.

BreamoreBoy · 2014-06-08T13:14:03Z

@manuel do you intend picking this up?

serhiy-storchaka · 2014-07-22T10:09:19Z

Here is alternative patch. I believe it makes a code simpler.

Microbenchmarks:

$ ./python -m timeit -n 100000 -s "from ntpath import splitdrive"  "splitdrive('c:foo')"

Before: 100000 loops, best of 3: 20 usec per loop
After: 100000 loops, best of 3: 11.5 usec per loop

$ ./python -m timeit -n 100000 -s "from ntpath import splitext"  "splitext('python.exe')"

Before: 100000 loops, best of 3: 23.6 usec per loop
After: 100000 loops, best of 3: 18 usec per loop

$ ./python -m timeit -s "from ntpath import join"  "join('foo', 'bar')"

Before: 10000 loops, best of 3: 50.9 usec per loop
After: 10000 loops, best of 3: 32.3 usec per loop

$ ./python -m timeit -s "from ntpath import normpath"  "normpath('/foo/bar/baz')"

Before: 10000 loops, best of 3: 67.5 usec per loop
After: 10000 loops, best of 3: 40.3 usec per loop

$ ./python -m timeit -s "from ntpath import relpath"  "relpath('foo', 'bar')"

Before: 1000 loops, best of 3: 695 usec per loop
After: 1000 loops, best of 3: 456 usec per loop

vstinner · 2014-07-22T10:12:08Z

I like ntpath_cleanup.diff, I don't think that it makes the code worse.

FYI os.fsencode() accepts str too, you can simplify:

     if isinstance(path, bytes):
-        userhome = userhome.encode(sys.getfilesystemencoding())
+        userhome = os.fsencode(userhome)

to

+ userhome = os.fsencode(userhome)

serhiy-storchaka · 2014-07-22T16:36:50Z

No, if *path* is not bytes, *userhome* shouldn't be converted to bytes.

vstinner · 2014-07-22T21:15:58Z

Oh you're right sorry.

python-dev · 2014-07-23T17:43:25Z

New changeset b22aaa59d24f by Serhiy Storchaka in branch 'default':
Issue bpo-15275: Clean up and speed up the ntpath module.
http://hg.python.org/cpython/rev/b22aaa59d24f

mandel mannequin added the OS-windows label Jul 7, 2012

ezio-melotti added the performance Performance or resource usage label Jul 13, 2012

serhiy-storchaka closed this as completed Jul 23, 2014

serhiy-storchaka self-assigned this Jul 23, 2014

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

isinstance is called a more times that needed in ntpath #59480

isinstance is called a more times that needed in ntpath #59480

mandel mannequin commented Jul 7, 2012

mandel mannequin commented Jul 7, 2012

ezio-melotti commented Jul 13, 2012

mandel mannequin commented Jul 24, 2012

ezio-melotti commented Jul 25, 2012

BreamoreBoy mannequin commented Jun 8, 2014

serhiy-storchaka commented Jul 22, 2014

vstinner commented Jul 22, 2014

serhiy-storchaka commented Jul 22, 2014

vstinner commented Jul 22, 2014

python-dev mannequin commented Jul 23, 2014

isinstance is called a more times that needed in ntpath #59480

isinstance is called a more times that needed in ntpath #59480

Comments

mandel mannequin commented Jul 7, 2012

mandel mannequin commented Jul 7, 2012

ezio-melotti commented Jul 13, 2012

mandel mannequin commented Jul 24, 2012

ezio-melotti commented Jul 25, 2012

BreamoreBoy mannequin commented Jun 8, 2014

serhiy-storchaka commented Jul 22, 2014

vstinner commented Jul 22, 2014

serhiy-storchaka commented Jul 22, 2014

vstinner commented Jul 22, 2014

python-dev mannequin commented Jul 23, 2014