Message 71124 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	mgiuca
Recipients	gvanrossum, janssen, jimjjewett, loewis, mgiuca, orsenthil, pitrou, thomaspinckney3
Date	2008-08-14.12:18:48
SpamBayes Score	3.7809784e-08
Marked as misclassified	No
Message-id	<1218716331.89.0.261736879306.issue3300@psf.upfronthosting.co.za>
In-reply-to

Content
OK I implemented the defaultdict solution. I got curious so ran some rough speed tests, using the following code. import random, urllib.parse for i in range(0, 100000): str = ''.join(chr(random.randint(0, 0x10ffff)) for _ in range(50)) quoted = urllib.parse.quote(str) Time to quote 100,000 random strings of 50 characters. (Ran each test twice, worst case printed) HEAD, chars in range(0,0x110000): 1m44.80 HEAD, chars in range(0,256): 25.0s patch9, chars in range(0,0x110000): 35.3s patch9, chars in range(0,256): 27.4s New, chars in range(0,0x110000): 31.4s New, chars in range(0,256): 25.3s Head is the current Py3k head. Patch 9 is my previous patch (before implementing defaultdict), and New is after implementing defaultdict. Interesting. Defaultdict didn't really make much of an improvement. You can see the big help the cache itself makes, though (my code caches all chars, whereas the HEAD just caches ASCII chars, which is why HEAD is so slow on the full repertoire test). Other than that, differences are fairly negligible. However, I'll keep the defaultdict code, I quite like it, speedy or not (it is slightly faster).

OK I implemented the defaultdict solution. I got curious so ran some
rough speed tests, using the following code.

import random, urllib.parse
for i in range(0, 100000):
    str = ''.join(chr(random.randint(0, 0x10ffff)) for _ in range(50))
    quoted = urllib.parse.quote(str)

Time to quote 100,000 random strings of 50 characters.
(Ran each test twice, worst case printed)

HEAD, chars in range(0,0x110000): 1m44.80
HEAD, chars in range(0,256): 25.0s
patch9, chars in range(0,0x110000): 35.3s
patch9, chars in range(0,256): 27.4s
New, chars in range(0,0x110000): 31.4s
New, chars in range(0,256): 25.3s

Head is the current Py3k head. Patch 9 is my previous patch (before
implementing defaultdict), and New is after implementing defaultdict.

Interesting. Defaultdict didn't really make much of an improvement. You
can see the big help the cache itself makes, though (my code caches all
chars, whereas the HEAD just caches ASCII chars, which is why HEAD is so
slow on the full repertoire test). Other than that, differences are
fairly negligible.

However, I'll keep the defaultdict code, I quite like it, speedy or not
(it is slightly faster).

History
Date	User	Action	Args
2008-08-14 12:18:51	mgiuca	set	recipients: + mgiuca, gvanrossum, loewis, jimjjewett, janssen, orsenthil, pitrou, thomaspinckney3
2008-08-14 12:18:51	mgiuca	set	messageid: <1218716331.89.0.261736879306.issue3300@psf.upfronthosting.co.za>
2008-08-14 12:18:49	mgiuca	link	issue3300 messages
2008-08-14 12:18:48	mgiuca	create