Message 113887 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	pitrou
Recipients	daniel.urban, dmtr, eric.araujo, ezio.melotti, mark.dickinson, pitrou, rhettinger, terry.reedy
Date	2010-08-14.11:25:47
SpamBayes Score	4.919027e-08
Marked as misclassified	No
Message-id	<1281785151.62.0.78409418245.issue9520@psf.upfronthosting.co.za>
In-reply-to

Content
So, here is the modified benchmark. It first creates a cache of the random wordlist, because it is quite long to compute (for N up to 10000000). The cache is reused in subsequent runs. This takes some memory though (probably won't run it if you have less than 2GB). The cache itself is quite small on-disk (around 80 MB). I'm outputting the total dict size and the space occupied per individual key, and I've also timed lookups separately. Here are results on my machine: 1000 words ( 961 keys), 3269137 words/s, 16980987 lookups/s, 51 bytes/key (0.0MB) 10000 words ( 9042 keys), 2697648 words/s, 13680052 lookups/s, 87 bytes/key (0.8MB) 100000 words ( 83168 keys), 2462269 words/s, 6956074 lookups/s, 37 bytes/key (3.0MB) 500000 words ( 389442 keys), 1802431 words/s, 4733774 lookups/s, 64 bytes/key (24.0MB) 1000000 words ( 755372 keys), 1702130 words/s, 4485229 lookups/s, 66 bytes/key (48.0MB) 2000000 words ( 1463359 keys), 1616658 words/s, 4251021 lookups/s, 68 bytes/key (96.0MB) 5000000 words ( 3501140 keys), 1608889 words/s, 3909212 lookups/s, 57 bytes/key (192.0MB) 10000000 words ( 6764089 keys), 1531136 words/s, 3600395 lookups/s, 59 bytes/key (384.0MB) As you can see, the O(1) behaviour seems almost observed up to the 10000000 words limit (given the way the data is computed, I'm afraid I can't go further than that). Of course, very small dicts are faster because everything fits in the CPU caches.

So, here is the modified benchmark. It first creates a cache of the random wordlist, because it is quite long to compute (for N up to 10000000). The cache is reused in subsequent runs. This takes some memory though (probably won't run it if you have less than 2GB). The cache itself is quite small on-disk (around 80 MB).

I'm outputting the total dict size and the space occupied per individual key, and I've also timed lookups separately. Here are results on my machine:

    1000 words (     961 keys), 3269137 words/s, 16980987 lookups/s, 51 bytes/key (0.0MB)
   10000 words (    9042 keys), 2697648 words/s, 13680052 lookups/s, 87 bytes/key (0.8MB)
  100000 words (   83168 keys), 2462269 words/s, 6956074 lookups/s, 37 bytes/key (3.0MB)
  500000 words (  389442 keys), 1802431 words/s, 4733774 lookups/s, 64 bytes/key (24.0MB)
 1000000 words (  755372 keys), 1702130 words/s, 4485229 lookups/s, 66 bytes/key (48.0MB)
 2000000 words ( 1463359 keys), 1616658 words/s, 4251021 lookups/s, 68 bytes/key (96.0MB)
 5000000 words ( 3501140 keys), 1608889 words/s, 3909212 lookups/s, 57 bytes/key (192.0MB)
10000000 words ( 6764089 keys), 1531136 words/s, 3600395 lookups/s, 59 bytes/key (384.0MB)

As you can see, the O(1) behaviour seems almost observed up to the 10000000 words limit (given the way the data is computed, I'm afraid I can't go further than that). Of course, very small dicts are faster because everything fits in the CPU caches.

History
Date	User	Action	Args
2010-08-14 11:25:51	pitrou	set	recipients: + pitrou, rhettinger, terry.reedy, mark.dickinson, ezio.melotti, eric.araujo, daniel.urban, dmtr
2010-08-14 11:25:51	pitrou	set	messageid: <1281785151.62.0.78409418245.issue9520@psf.upfronthosting.co.za>
2010-08-14 11:25:49	pitrou	link	issue9520 messages
2010-08-14 11:25:48	pitrou	create