This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author dmtr
Recipients daniel.urban, dmtr, eric.araujo, ezio.melotti, mark.dickinson, pitrou, rhettinger, terry.reedy
Date 2010-08-15.00:31:18
SpamBayes Score 0.00023008484
Marked as misclassified No
Message-id <1281832284.12.0.795070860868.issue9520@psf.upfronthosting.co.za>
In-reply-to
Content
Yes, it looks like you are right. And while there is some slight performance degradation, at least nothing drastic is happening up to 30M keys. Using your modified test:

    1000 words (     961 keys), 3609555 words/s, 19239926 lookups/s, 51 bytes/key (0.0MB)
   10000 words (    9042 keys), 3390980 words/s, 18180771 lookups/s, 87 bytes/key (0.0MB)
  100000 words (   83168 keys), 3298809 words/s, 12509481 lookups/s, 37 bytes/key (3.0MB)
 1000000 words (  755372 keys), 2477793 words/s, 7537963 lookups/s, 66 bytes/key (48.0MB)
 5000000 words ( 3501140 keys), 2291004 words/s, 6487727 lookups/s, 57 bytes/key (192.0MB)
10000000 words ( 6764089 keys), 2238081 words/s, 6216454 lookups/s, 59 bytes/key (384.0MB)
20000000 words (13061821 keys), 2175688 words/s, 5817085 lookups/s, 61 bytes/key (768.0MB)
50000000 words (31188460 keys), 2117751 words/s, 5137209 lookups/s, 51 bytes/key (1536.0MB)

It looks like internal memory estimates (sys.getsizeof()) are also pretty good:

Before: ['VmPeak:\t10199764 kB', 'VmSize:\t 5552416 kB']
50000000 words (31188460 keys), 2117751 words/s, 5137209 lookups/s, 51 bytes/key (1536.0MB)
After: ['VmPeak:\t10199764 kB', 'VmSize:\t 7125284 kB']
	 
7 125 284 kB - 5 552 416 kB = 1 536.00391 MB
History
Date User Action Args
2010-08-15 00:31:24dmtrsetrecipients: + dmtr, rhettinger, terry.reedy, mark.dickinson, pitrou, ezio.melotti, eric.araujo, daniel.urban
2010-08-15 00:31:24dmtrsetmessageid: <1281832284.12.0.795070860868.issue9520@psf.upfronthosting.co.za>
2010-08-15 00:31:21dmtrlinkissue9520 messages
2010-08-15 00:31:19dmtrcreate