Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode performance regression in python3.3 vs python3.2 #57830

Closed
Lothiraldan mannequin opened this issue Dec 17, 2011 · 12 comments
Closed

Unicode performance regression in python3.3 vs python3.2 #57830

Lothiraldan mannequin opened this issue Dec 17, 2011 · 12 comments
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage topic-unicode

Comments

@Lothiraldan
Copy link
Mannequin

Lothiraldan mannequin commented Dec 17, 2011

BPO 13621
Nosy @loewis, @pitrou, @vstinner, @ezio-melotti, @florentx, @Lothiraldan, @serhiy-storchaka
Files
  • stringbench_log_cpython3.2: Stringbenchmark log for cpython3.2
  • stringbench_log_cpython3.3: String benchmark log for cpython3.3
  • compare.py: Script used to compute diff between two runs
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2012-04-27.15:31:39.330>
    created_at = <Date 2011-12-17.17:34:48.283>
    labels = ['interpreter-core', 'expert-unicode', 'performance']
    title = 'Unicode performance regression in python3.3 vs python3.2'
    updated_at = <Date 2012-04-27.15:31:39.329>
    user = 'https://github.com/Lothiraldan'

    bugs.python.org fields:

    activity = <Date 2012-04-27.15:31:39.329>
    actor = 'loewis'
    assignee = 'none'
    closed = True
    closed_date = <Date 2012-04-27.15:31:39.330>
    closer = 'loewis'
    components = ['Interpreter Core', 'Unicode']
    creation = <Date 2011-12-17.17:34:48.283>
    creator = 'Boris.FELD'
    dependencies = []
    files = ['23991', '23992', '23994']
    hgrepos = []
    issue_num = 13621
    keywords = []
    message_count = 12.0
    messages = ['149681', '149682', '149684', '149688', '149694', '149696', '149728', '149730', '149731', '159451', '159461', '159470']
    nosy_count = 9.0
    nosy_names = ['loewis', 'collinwinter', 'pitrou', 'vstinner', 'ezio.melotti', 'flox', 'Boris.FELD', 'python-dev', 'serhiy.storchaka']
    pr_nums = []
    priority = 'normal'
    resolution = 'wont fix'
    stage = None
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue13621'
    versions = ['Python 3.3']

    @Lothiraldan
    Copy link
    Mannequin Author

    Lothiraldan mannequin commented Dec 17, 2011

    Hello everyone, I juste tried to launch the stringbench on python3.2 and python3.3 dev versions and some unicode tests run slower in python3.3 than in python3.2.

    I cc the two raw output of both runs. I also extracted most interesting data (all the tests with more than 20% of performance regression):

    • ("A"*1000).find("B") (*1000): -30.379747%
    • "Hello\t \t".rstrip() (*1000): -33.333333%
    • "this\nis\na\ntest\n".rsplit("\n") (*1000): -23.437500%
    • "\nHello!\n".strip() (*1000): -33.333333%
    • dna.split("ACTAT") (*10): -21.066667%
    • "Andrew".endswith("w") (*1000): -23.529412%
    • "...text.with.2000.lines...replace("\n", " ") (*10): -37.668161%
    • "\t \tHello".rstrip() (*1000): -33.333333%
    • ("A"*1000).rpartition("A") (*1000): -21.212121%
    • ("Here are some words. "*2).split() (*1000): -22.105263%
    • "Hello!\n".rstrip() (*1000): -35.714286%
    • "B" in "A"*1000 (*1000): -32.089552%
    • "Hello!\n".strip() (*1000): -35.714286%
    • "\nHello!".strip() (*1000): -28.571429%
    • "this\nis\na\ntest\n".split("\n") (*1000): -23.437500%
    • "Andrew".startswith("A") (*1000): -20.588235%
    • "\nHello!".rstrip() (*1000): -35.714286%
    • "Andrew".endswith("Andrew") (*1000): -22.857143%
    • "Andrew".endswith("Anders") (*1000): -23.529412%
    • "The %(k1)s is %(k2)s the %(k3)s."%{"k1":"x","k2":"y","k3":"z",} (*1000): -49.411765%
    • "Andrew".startswith("Anders") (*1000): -23.529412%
    • "this--is--a--test--of--the--emergency--broadcast--system".split("--") (*1000): -22.429907%
    • "Andrew"+"Dalke" (*1000): -23.076923%

    @Lothiraldan Lothiraldan mannequin added performance Performance or resource usage labels Dec 17, 2011
    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Dec 17, 2011

    Thanks, this is a known issue. I'm not too worried, since they are fairly artificial. In the cases I've looked at, I don't think anything can be done about that.

    @vstinner
    Copy link
    Member

    Sorted and grouped results. "replace", "find" and "concat" should be easy to fix, "format" is a little bit more complex, "strip" and "split" depends on "find" performance and require to scan the substring to ensure that the result is canonical (except if inputs are all ASCII, which is the case in these examples).

    replace:

    • "...text.with.2000.lines...replace("\n", " ") (*10): -37.668161%
    • "...text.with.2000.lines...replace("\n", " ") (*10): -37.668161%

    find:

    • ("A"*1000).find("B") (*1000): -30.379747%

    • "Andrew"+"Dalke" (*1000): -23.076923%- ("A"*1000).find("B") (*1000): -30.379747%

    • "Andrew".startswith("A") (*1000): -20.588235%

    • "Andrew".startswith("Anders") (*1000): -23.529412%

    • "Andrew".startswith("A") (*1000): -20.588235%

    • "Andrew".startswith("Anders") (*1000): -23.529412%

    • "Andrew".endswith("w") (*1000): -23.529412%

    • "Andrew".endswith("Andrew") (*1000): -22.857143%

    • "Andrew".endswith("Anders") (*1000): -23.529412%

    • "Andrew".endswith("w") (*1000): -23.529412%

    • "Andrew".endswith("Andrew") (*1000): -22.857143%

    • "Andrew".endswith("Anders") (*1000): -23.529412%

    • "B" in "A"*1000 (*1000): -32.089552%

    • "B" in "A"*1000 (*1000): -32.089552%

    concat:

    • "Andrew"+"Dalke" (*1000): -23.076923%

    format:

    • "The %(k1)s is %(k2)s the %(k3)s."%{"k1":"x","k2":"y","k3":"z",} (*1000): -49.411765%
    • "The %(k1)s is %(k2)s the %(k3)s."%{"k1":"x","k2":"y","k3":"z",} (*1000): -49.411765%

    strip:

    • "\nHello!\n".strip() (*1000): -33.333333%

    • "Hello!\n".strip() (*1000): -35.714286%

    • "\nHello!".strip() (*1000): -28.571429%

    • "\nHello!\n".strip() (*1000): -33.333333%

    • "Hello!\n".strip() (*1000): -35.714286%

    • "\nHello!".strip() (*1000): -28.571429%

    • "Hello\t \t".rstrip() (*1000): -33.333333%

    • "\t \tHello".rstrip() (*1000): -33.333333%

    • "Hello!\n".rstrip() (*1000): -35.714286%

    • "\nHello!".rstrip() (*1000): -35.714286%

    • "Hello\t \t".rstrip() (*1000): -33.333333%

    • "\t \tHello".rstrip() (*1000): -33.333333%

    • "Hello!\n".rstrip() (*1000): -35.714286%

    • "\nHello!".rstrip() (*1000): -35.714286%

    split:

    • dna.split("ACTAT") (*10): -21.066667%

    • ("Here are some words. "*2).split() (*1000): -22.105263%

    • "this\nis\na\ntest\n".split("\n") (*1000): -23.437500%

    • "this--is--a--test--of--the--emergency--broadcast--system".split("--") (*1000): -22.429907%

    • dna.split("ACTAT") (*10): -21.066667%

    • ("Here are some words. "*2).split() (*1000): -22.105263%

    • "this\nis\na\ntest\n".split("\n") (*1000): -23.437500%

    • "this--is--a--test--of--the--emergency--broadcast--system".split("--") (*1000): -22.429907%

    • "this\nis\na\ntest\n".rsplit("\n") (*1000): -23.437500%

    • "this\nis\na\ntest\n".rsplit("\n") (*1000): -23.437500%

    • ("A"*1000).rpartition("A") (*1000): -21.212121%

    • ("A"*1000).rpartition("A") (*1000): -21.212121%

    @Lothiraldan
    Copy link
    Mannequin Author

    Lothiraldan mannequin commented Dec 17, 2011

    Forgot to describe my environment:
    Mac OS X 10.6.8
    GCC i686-apple-darwin10-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5666) (dot 3)
    CPython3.3 revision ea421c534305
    CPython3.2 revision 0b86da9d6964

    @vstinner
    Copy link
    Member

    See also the issue bpo-13623 for results on bytes.

    @pitrou
    Copy link
    Member

    pitrou commented Dec 17, 2011

    Just a note: performance reports shouldn't be assigned to the "benchmarks" category, except if the problem is in the benchmarks themselves.

    @pitrou pitrou added interpreter-core (Objects, Python, Grammar, and Parser dirs) and removed performance Performance or resource usage labels Dec 17, 2011
    @vstinner
    Copy link
    Member

    "...text.with.2000.lines...replace("\n", " ") (*10): -37.668161%

    I also noticed a difference between Python 3.2 and 3.3, but Python 3.3 is 13% *faster* (and not slower). This benchmark is not really representative because stringbench only tests .replace() with ASCII. Replace requires to scan the result to check the next maximum character, except for ASCII. So expect a performance regression... except for ASCII.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Dec 18, 2011

    New changeset c802bfc8acfc by Victor Stinner in branch 'default':
    Issue bpo-13621: Optimize str.replace(char1, char2)
    http://hg.python.org/cpython/rev/c802bfc8acfc

    @vstinner
    Copy link
    Member

    I also noticed a difference between Python 3.2 and 3.3,
    but Python 3.3 is 13% *faster* (and not slower).

    Oops, I misused the timeit module, there is a regression.

    New changeset c802bfc8acfc by Victor Stinner in branch 'default':
    Issue bpo-13621: Optimize str.replace(char1, char2)

    ./python -m timeit -s 'f=open("/tmp/README"); t=f.read(); f.close(); t.encode("ascii")' 't.replace("\n", " ")'

    Python 3.2: 6.44 usec
    Python 3.3 before: 11.6 usec
    Python 3.3 after: 2.77 usec

    @vstinner
    Copy link
    Member

    "Andrew"+"Dalke" (*1000): -23.076923%

    /python -m timeit '"Andrew"+"Dalke"' gives me very close results with Python 3.2 (wide mode) and 3.3. Somethings like 0.15 vs 0.151 microseconds.

    But using longer (ASCII) strings, Python 3.3 is 2.6x faster:

    $ python3.2 -m timeit -s 'a="A"*1000; b="B"*1000' 'a+b'
    1000000 loops, best of 3: 0.39 usec per loop
    $ python3.3 -m timeit -s 'a="A"*1000; b="B"*1000' 'a+b'
    10000000 loops, best of 3: 0.151 usec per loop

    @serhiy-storchaka
    Copy link
    Member

    But try ASCII+UCS2 or ASCII+UCS4.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Apr 27, 2012

    I'm closing this as "won't fix". The only way to get back the exact performance of 3.2 is to restore to the 3.2 implementation, which clearly is no option. I don't consider performance regressions in micro benchmarks inherently as a bug.

    If there is a specific regression which people think constitutes a real problem, a separate bug report should be submitted.

    @loewis loewis mannequin closed this as completed Apr 27, 2012
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage topic-unicode
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants