Author terry.reedy
Recipients cheryl.sabella, serhiy.storchaka, terry.reedy
Date 2018-02-28.04:57:34
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1519793855.55.0.467229070634.issue32940@psf.upfronthosting.co.za>
In-reply-to
Content
I settled on the following to compare ParseMap implementations.

from idlelib.pyparse import Parser
import timeit

class ParseGet(dict):
    def __getitem__(self, key): return self.get(key, ord('x'))
class ParseMis(dict):
    def __missing__(self, key): return ord('x')

for P in (ParseGet, ParseMis):
    print(P.__name__, 'hit', 'miss')
    p = p=P({i:i for i in (10, 34, 35, 39, 40, 41, 91, 92, 93, 123, 125)})
    print(timeit.timeit(
        "p[10],p[34],p[35],p[39],p[40],p[41],p[91],p[92],p[93],p[125]",
        number=100000, globals = globals()))
    print(timeit.timeit(
        "p[11],p[33],p[36],p[45],p[50],p[61],p[71],p[82],p[99],p[125]",
        number=100000, globals = globals()))

ParseGet hit miss
1.104342376
1.112531999
ParseMis hit miss
0.3530207070000002
1.2165967760000003

ParseGet hit miss
1.185322191
1.1915449519999999
ParseMis hit miss
0.3477272720000002
1.317010653

Avoiding custom code for all ascii chars will be a win.  I am sure that calling __missing__ for non-ascii will be at least as fast as it is presently.  I will commit a revision tomorrow.  

I may then compare to Serhiy's sub/replace suggestion.  My experiments with 'code.translate(tran)' indicate that time grows sub-linearly up to 1000 or 10000 chars.  This suggests that there are significant constant or log-like terms.
History
Date User Action Args
2018-02-28 04:57:35terry.reedysetrecipients: + terry.reedy, serhiy.storchaka, cheryl.sabella
2018-02-28 04:57:35terry.reedysetmessageid: <1519793855.55.0.467229070634.issue32940@psf.upfronthosting.co.za>
2018-02-28 04:57:35terry.reedylinkissue32940 messages
2018-02-28 04:57:34terry.reedycreate