Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize parsing of regular expressions #63579

Closed
serhiy-storchaka opened this issue Oct 24, 2013 · 13 comments
Closed

Optimize parsing of regular expressions #63579

serhiy-storchaka opened this issue Oct 24, 2013 · 13 comments
Assignees
Labels
performance Performance or resource usage stdlib Python modules in the Lib dir topic-regex

Comments

@serhiy-storchaka
Copy link
Member

BPO 19380
Nosy @pitrou, @vstinner, @ezio-melotti, @serhiy-storchaka, @MojoVampire
Files
  • re_parse.patch
  • re_parse_2.patch
  • re_parse_3.patch
  • re_parse_4.patch
  • re_parse_5.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/serhiy-storchaka'
    closed_at = <Date 2014-10-10.08:46:47.543>
    created_at = <Date 2013-10-24.20:14:19.696>
    labels = ['expert-regex', 'library', 'performance']
    title = 'Optimize parsing of regular expressions'
    updated_at = <Date 2014-10-10.08:46:47.541>
    user = 'https://github.com/serhiy-storchaka'

    bugs.python.org fields:

    activity = <Date 2014-10-10.08:46:47.541>
    actor = 'serhiy.storchaka'
    assignee = 'serhiy.storchaka'
    closed = True
    closed_date = <Date 2014-10-10.08:46:47.543>
    closer = 'serhiy.storchaka'
    components = ['Library (Lib)', 'Regular Expressions']
    creation = <Date 2013-10-24.20:14:19.696>
    creator = 'serhiy.storchaka'
    dependencies = []
    files = ['32341', '32343', '36649', '36818', '36843']
    hgrepos = []
    issue_num = 19380
    keywords = ['patch', 'needs review']
    message_count = 13.0
    messages = ['201177', '201183', '201191', '201192', '201227', '206557', '227032', '227041', '227053', '228605', '228838', '228964', '228971']
    nosy_count = 7.0
    nosy_names = ['pitrou', 'vstinner', 'ezio.melotti', 'mrabarnett', 'python-dev', 'serhiy.storchaka', 'josh.r']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue19380'
    versions = ['Python 3.5']

    @serhiy-storchaka
    Copy link
    Member Author

    Proposed patch optimizes parsing of regular expressions. Total time of re unittests decreased by 10%.

    @serhiy-storchaka serhiy-storchaka self-assigned this Oct 24, 2013
    @serhiy-storchaka serhiy-storchaka added stdlib Python modules in the Lib dir topic-regex performance Performance or resource usage labels Oct 24, 2013
    @pitrou
    Copy link
    Member

    pitrou commented Oct 24, 2013

    I don't think "+=" speeds up anything for ints, you might as well minimize code churn by avoiding such changes.

    @serhiy-storchaka
    Copy link
    Member Author

    Done.

    @pitrou
    Copy link
    Member

    pitrou commented Oct 24, 2013

    Do you have any benchmark figures (apart from the time of re unittests)?

    @serhiy-storchaka
    Copy link
    Member Author

    ### regex_compile ###
    Min: 2.897919 -> 2.577488: 1.12x faster
    Avg: 3.066306 -> 2.681966: 1.14x faster
    Significant (t=26.77)
    Stddev: 0.08789 -> 0.05085: 1.7283x smaller

    @serhiy-storchaka
    Copy link
    Member Author

    Could someone please make a review?

    @serhiy-storchaka
    Copy link
    Member Author

    Actually "if x:" is slightly faster than "if x is not None:" on current implementation.

    @pitrou
    Copy link
    Member

    pitrou commented Sep 18, 2014

    "is not None" is more readable, though. When using plain boolean testing, it's never obvious whether you can have a zero-length string, a null number, etc.

    @serhiy-storchaka
    Copy link
    Member Author

    Well, then please look at re_parse_2.patch (it is still applied cleanly).

    @serhiy-storchaka
    Copy link
    Member Author

    Here is a patch which addresses Yury's and Josh's comments. Also discarded few minor changes.

    @serhiy-storchaka
    Copy link
    Member Author

    Updated patch implements Antoine's suggestions.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Oct 10, 2014

    New changeset 1adeac2a8714 by Serhiy Storchaka in branch 'default':
    Issue bpo-19380: Optimized parsing of regular expressions.
    https://hg.python.org/cpython/rev/1adeac2a8714

    @serhiy-storchaka
    Copy link
    Member Author

    Thank you for your reviews Yury, Josh, and Antoine.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    performance Performance or resource usage stdlib Python modules in the Lib dir topic-regex
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants