Author mgiuca
Recipients gvanrossum, janssen, jimjjewett, loewis, mgiuca, orsenthil, pitrou, thomaspinckney3
Date 2008-08-14.12:18:48
SpamBayes Score 3.78098e-08
Marked as misclassified No
Message-id <1218716331.89.0.261736879306.issue3300@psf.upfronthosting.co.za>
In-reply-to
Content
OK I implemented the defaultdict solution. I got curious so ran some
rough speed tests, using the following code.

import random, urllib.parse
for i in range(0, 100000):
    str = ''.join(chr(random.randint(0, 0x10ffff)) for _ in range(50))
    quoted = urllib.parse.quote(str)

Time to quote 100,000 random strings of 50 characters.
(Ran each test twice, worst case printed)

HEAD, chars in range(0,0x110000): 1m44.80
HEAD, chars in range(0,256): 25.0s
patch9, chars in range(0,0x110000): 35.3s
patch9, chars in range(0,256): 27.4s
New, chars in range(0,0x110000): 31.4s
New, chars in range(0,256): 25.3s

Head is the current Py3k head. Patch 9 is my previous patch (before
implementing defaultdict), and New is after implementing defaultdict.

Interesting. Defaultdict didn't really make much of an improvement. You
can see the big help the cache itself makes, though (my code caches all
chars, whereas the HEAD just caches ASCII chars, which is why HEAD is so
slow on the full repertoire test). Other than that, differences are
fairly negligible.

However, I'll keep the defaultdict code, I quite like it, speedy or not
(it is slightly faster).
History
Date User Action Args
2008-08-14 12:18:51mgiucasetrecipients: + mgiuca, gvanrossum, loewis, jimjjewett, janssen, orsenthil, pitrou, thomaspinckney3
2008-08-14 12:18:51mgiucasetmessageid: <1218716331.89.0.261736879306.issue3300@psf.upfronthosting.co.za>
2008-08-14 12:18:49mgiucalinkissue3300 messages
2008-08-14 12:18:48mgiucacreate