This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author terry.reedy
Recipients cheryl.sabella, serhiy.storchaka, terry.reedy
Date 2018-02-28.20:23:43
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1519849423.76.0.467229070634.issue32940@psf.upfronthosting.co.za>
In-reply-to
Content
Replacing an expression with a less clear equivalent expression makes no sense to me.  Anyway, having __missing__ return 120 reduces the benchmark miss time from 1.2-1.3 to .93, making ParseMis always faster than ParseGet and reducing the penalty for non-ascii chars.

re.sub + str.replace is slower than translate

import re
import timeit

class ParseMap(dict):
    def __missing__(self, key): return 120  # ord('x')

trans = ParseMap((i,120) for i in range(128))
trans.update((ord(c), ord('(')) for c in "({[")
trans.update((ord(c), ord(')')) for c in ")}]")
trans.update((ord(c), ord(c)) for c in "\"'\\\n#")

trans_re = re.compile(r'''[^(){}\[]"'\\\n#]+''')
code='\t a([{b}])b"c\'d\n'*1000  # n = 1, 10, 100, 1000

print(timeit.timeit(
    'code.translate(trans)',
    number=10000, globals = globals()))
print(timeit.timeit(
    "code1 = trans_re.sub('x', code)\n"
    "code2 = code1.replace('{', '(')\n"
    "code3 = code2.replace('}', ')')\n"
    "code4 = code3.replace('[', '(')\n"
    "code5 = code4.replace(']', '(')\n"
    r"code6 = code5.replace('\nx', '\n')",
    number=10000, globals = globals()))

n     trans   re
1      .06    .09 
10     .08    .17
100    .28   1.00
1000  2.2    8.9

Multiply by 100 to get microseconds or seconds for 1000000.
History
Date User Action Args
2018-02-28 20:23:43terry.reedysetrecipients: + terry.reedy, serhiy.storchaka, cheryl.sabella
2018-02-28 20:23:43terry.reedysetmessageid: <1519849423.76.0.467229070634.issue32940@psf.upfronthosting.co.za>
2018-02-28 20:23:43terry.reedylinkissue32940 messages
2018-02-28 20:23:43terry.reedycreate