Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

isinstance is called a more times that needed in ntpath #59480

Closed
mandel mannequin opened this issue Jul 7, 2012 · 10 comments
Closed

isinstance is called a more times that needed in ntpath #59480

mandel mannequin opened this issue Jul 7, 2012 · 10 comments
Assignees
Labels
OS-windows performance Performance or resource usage

Comments

@mandel
Copy link
Mannequin

mandel mannequin commented Jul 7, 2012

BPO 15275
Nosy @terryjreedy, @jcea, @vstinner, @ezio-melotti, @briancurtin, @serhiy-storchaka
Files
  • less_isinstance.patch: Patch that reduces the number of isinstance calls performed.
  • ntpath_cleanup.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/serhiy-storchaka'
    closed_at = <Date 2014-07-23.17:46:54.258>
    created_at = <Date 2012-07-07.13:38:18.190>
    labels = ['OS-windows', 'performance']
    title = 'isinstance is called a more times that needed in ntpath'
    updated_at = <Date 2014-07-23.17:46:54.257>
    user = 'https://bugs.python.org/mandel'

    bugs.python.org fields:

    activity = <Date 2014-07-23.17:46:54.257>
    actor = 'serhiy.storchaka'
    assignee = 'serhiy.storchaka'
    closed = True
    closed_date = <Date 2014-07-23.17:46:54.258>
    closer = 'serhiy.storchaka'
    components = ['Windows']
    creation = <Date 2012-07-07.13:38:18.190>
    creator = 'mandel'
    dependencies = []
    files = ['26294', '36030']
    hgrepos = ['140']
    issue_num = 15275
    keywords = ['patch']
    message_count = 10.0
    messages = ['164842', '165409', '166267', '166390', '220033', '223656', '223657', '223672', '223698', '223752']
    nosy_count = 9.0
    nosy_names = ['terry.reedy', 'jcea', 'vstinner', 'ezio.melotti', 'brian.curtin', 'BreamoreBoy', 'python-dev', 'mandel', 'serhiy.storchaka']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue15275'
    versions = ['Python 3.5']

    @mandel
    Copy link
    Mannequin Author

    mandel mannequin commented Jul 7, 2012

    The problem is simple, the code that allows to use binary strings and unicode is making more calls that needed to isinstance(path, bytes) since the result of the code is not shared. For example, the following calls are present in the module:

    def _get_empty(path):
        if isinstance(path, bytes):
            return b'' 
        else:
            return ''
    
    def _get_sep(path):
        if isinstance(path, bytes):
            return b'\\'
        else:
            return '\\'
    
    def _get_altsep(path):
        if isinstance(path, bytes):
            return b'/'
        else:
            return '/'
    
    def _get_bothseps(path):
        if isinstance(path, bytes):
            return b'\\/'
        else:
            return '\\/'
    
    def _get_dot(path):
        if isinstance(path, bytes):
            return b'.'
        else:
            return '.'

    ...

    And then something similar to the following is found in the code:

    def normpath(path):
        """Normalize path, eliminating double slashes, etc."""
        sep = _get_sep(path)
        dotdot = _get_dot(path) * 2
        special_prefixes = _get_special(path)
        if path.startswith(special_prefixes):
            # in the case of paths with these prefixes:
            # \\.\ -> device names
            # \\?\ -> literal paths
            # do not do any normalization, but return the path unchanged
            return path
        path = path.replace(_get_altsep(path), sep)
        prefix, path = splitdrive(path)

    As you can see the isinstance call is performed more than needed which certainly affects the performance of the path operations.

    The attached patch removes the number of calls to isinstance(obj, bytes) and also ensures that the function that returns the correct literal is as fast as possible by using a dict.

    @mandel mandel mannequin added the OS-windows label Jul 7, 2012
    @ezio-melotti
    Copy link
    Member

    Have you tried doing some benchmarks before and after the patch?
    If this patch is applied I think it would be good to change posixpath too.
    Also make sure that the changes you made are covered by the tests.

    @ezio-melotti ezio-melotti added the performance Performance or resource usage label Jul 13, 2012
    @mandel
    Copy link
    Mannequin Author

    mandel mannequin commented Jul 24, 2012

    Tests indeed cover the changes made. I don't know about a decent way of doing benchmarks for the changes. Any recommendation?

    If this patch is applied I think it would be good to change posixpath too.

    I agree and I'd love to do it but in a diff bug to make things self-contained, what do you think?

    @ezio-melotti
    Copy link
    Member

    I don't know about a decent way of doing benchmarks for the changes.
    Any recommendation?

    You could make a script that uses the timeit module.

    > If this patch is applied I think it would be good to change
    > posixpath too.
    I agree and I'd love to do it but in a diff bug to make things
    self-contained, what do you think?

    Having a single patch that fixes both is OK.

    @BreamoreBoy
    Copy link
    Mannequin

    BreamoreBoy mannequin commented Jun 8, 2014

    @manuel do you intend picking this up?

    @serhiy-storchaka
    Copy link
    Member

    Here is alternative patch. I believe it makes a code simpler.

    Microbenchmarks:

    $ ./python -m timeit -n 100000 -s "from ntpath import splitdrive"  "splitdrive('c:foo')"

    Before: 100000 loops, best of 3: 20 usec per loop
    After: 100000 loops, best of 3: 11.5 usec per loop

    $ ./python -m timeit -n 100000 -s "from ntpath import splitext"  "splitext('python.exe')"

    Before: 100000 loops, best of 3: 23.6 usec per loop
    After: 100000 loops, best of 3: 18 usec per loop

    $ ./python -m timeit -s "from ntpath import join"  "join('foo', 'bar')"

    Before: 10000 loops, best of 3: 50.9 usec per loop
    After: 10000 loops, best of 3: 32.3 usec per loop

    $ ./python -m timeit -s "from ntpath import normpath"  "normpath('/foo/bar/baz')"

    Before: 10000 loops, best of 3: 67.5 usec per loop
    After: 10000 loops, best of 3: 40.3 usec per loop

    $ ./python -m timeit -s "from ntpath import relpath"  "relpath('foo', 'bar')"

    Before: 1000 loops, best of 3: 695 usec per loop
    After: 1000 loops, best of 3: 456 usec per loop

    @vstinner
    Copy link
    Member

    I like ntpath_cleanup.diff, I don't think that it makes the code worse.

    FYI os.fsencode() accepts str too, you can simplify:

         if isinstance(path, bytes):
    -        userhome = userhome.encode(sys.getfilesystemencoding())
    +        userhome = os.fsencode(userhome)

    to

    + userhome = os.fsencode(userhome)

    @serhiy-storchaka
    Copy link
    Member

    No, if *path* is not bytes, *userhome* shouldn't be converted to bytes.

    @vstinner
    Copy link
    Member

    Oh you're right sorry.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jul 23, 2014

    New changeset b22aaa59d24f by Serhiy Storchaka in branch 'default':
    Issue bpo-15275: Clean up and speed up the ntpath module.
    http://hg.python.org/cpython/rev/b22aaa59d24f

    @serhiy-storchaka serhiy-storchaka self-assigned this Jul 23, 2014
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    OS-windows performance Performance or resource usage
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants