Message 218951 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	vstinner
Recipients	alexandre.vassalotti, neologix, pitrou, serhiy.storchaka, tim.peters, vstinner
Date	2014-05-23.09:01:09
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1400835670.7.0.661461882456.issue21556@psf.upfronthosting.co.za>
In-reply-to

Content
> And the straightforward collision resolution in hashtable.c is much less efficient at mitigating collisions than a Python dict's. Modules/hashtable.c comes from http://sourceforge.net/projects/libcfu/ (cfuhash type). I adapted the code for my needs (the tracemalloc module), but I didn't try to make it fast. My main change was to store data directly in an entry of a table instead of using a second memory block for the data (and a pointer in the hash table entry). I didn't want to use the Python dict type for tracemalloc because this type may use the Python memory allocator which would lead to reentrant calls to tracemalloc. It's also nice to have a function to compute exactly the memory usage of an hash table (table + data), it's exposed in tracemalloc.get_tracemalloc_memory(). It doesn't mean that I want hashtable.c to be slow :-) Feel free to modify it as you want. But trying to make it as fast as Python dict would be hard. The design is different. Python dict uses open addressing, whereas _Py_hashtable uses linked list to store entries. Python dict is well optimized.

> And the straightforward collision resolution in hashtable.c is much less efficient at mitigating collisions than a Python dict's.

Modules/hashtable.c comes from http://sourceforge.net/projects/libcfu/ (cfuhash type). I adapted the code for my needs (the tracemalloc module), but I didn't try to make it fast. My main change was to store data directly in an entry of a table instead of using a second memory block for the data (and a pointer in the hash table entry).

I didn't want to use the Python dict type for tracemalloc because this type may use the Python memory allocator which would lead to reentrant calls to tracemalloc. It's also nice to have a function to compute exactly the memory usage of an hash table (table + data), it's exposed in tracemalloc.get_tracemalloc_memory().

It doesn't mean that I want hashtable.c to be slow :-) Feel free to modify it as you want. But trying to make it as fast as Python dict would be hard. The design is different. Python dict uses open addressing, whereas _Py_hashtable uses linked list to store entries. Python dict is well optimized.

History
Date	User	Action	Args
2014-05-23 09:01:10	vstinner	set	recipients: + vstinner, tim.peters, pitrou, alexandre.vassalotti, neologix, serhiy.storchaka
2014-05-23 09:01:10	vstinner	set	messageid: <1400835670.7.0.661461882456.issue21556@psf.upfronthosting.co.za>
2014-05-23 09:01:10	vstinner	link	issue21556 messages
2014-05-23 09:01:09	vstinner	create