Issue1507011
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2006-06-16 01:15 by belopolsky, last changed 2022-04-11 14:56 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
interned-set.patch | belopolsky, 2006-06-16 01:15 | Patch to use a set object to store interned strings | ||
interned-set-1.patch | belopolsky, 2006-06-16 03:44 |
Messages (11) | |||
---|---|---|---|
msg50477 - (view) | Author: Alexander Belopolsky (belopolsky) * ![]() |
Date: 2006-06-16 01:15 | |
This patch is a proof of concept only. In particular, _PySet_LookString is *not* proposed for addition to set API. Better performance can probably be achieved by implementing interning completely inside setobject.c . |
|||
msg50478 - (view) | Author: Alexander Belopolsky (belopolsky) * ![]() |
Date: 2006-06-16 03:44 | |
Logged In: YES user_id=835142 Fixed code comments and moved all interning logic to setobject.c |
|||
msg50479 - (view) | Author: Martin v. Löwis (loewis) * ![]() |
Date: 2006-06-16 05:12 | |
Logged In: YES user_id=21627 What is the purpose of posting the patch here, then? The Python patches tracker should only carry patches that are meant for inclusion in Python. If inclusion is not planned, it should be closed (it would still be available, then). |
|||
msg50480 - (view) | Author: Alexander Belopolsky (belopolsky) * ![]() |
Date: 2006-06-16 12:48 | |
Logged In: YES user_id=835142 The purporse was to allow others to comment on the work in progress. Would it be more appropriate to post it on python-dev instead? If I close a patch while it is not ready, can I reopen it later when it is complete? In any case, please consider interned-set-1.patch for inclusion in Python. |
|||
msg50481 - (view) | Author: Martin v. Löwis (loewis) * ![]() |
Date: 2006-06-16 21:55 | |
Logged In: YES user_id=21627 I did not read the python-dev discussion before writing my first comment, so it is fine that you posted the patch here. Still, I think some rationale should be provided for doing this: what is the advantage? |
|||
msg50482 - (view) | Author: Alexander Belopolsky (belopolsky) * ![]() |
Date: 2006-06-17 00:52 | |
Logged In: YES user_id=835142 Rationale: The container of interned strings logically a set. Current implementation that uses a dictionary with the same keys and values is clearly an artificial implementation of a set necessary in earlier versions of Python. With a proper C implementation of sets available, it is natural to use it to store interned strings. Since set and dict objects use the same lookup algorithm, this patch is not expected to affect performance and pybench does not show any difference. Since a large set is using half of the memory of the equivalent dict, this patch is expected to reduce interpretor memory usage. |
|||
msg50483 - (view) | Author: Martin v. Löwis (loewis) * ![]() |
Date: 2006-06-17 06:44 | |
Logged In: YES user_id=21627 I don't think "it's natural to do it" is a convincing object reason to change a running system. A change should address correctness, efficiency, maintainability, functionality (features), legal issues, etc. I can't see any of this in this patch, so I'm -1. |
|||
msg50484 - (view) | Author: Rene Dudfield (illume) | Date: 2006-06-30 06:27 | |
Logged In: YES user_id=2042 The author said that the patch is expected to reduce memory. If this is true, then the patch has a good reason to be applied. Another reason is maintainability. By removing the set like functionality which is already implemented in the setobject. |
|||
msg50485 - (view) | Author: Martin v. Löwis (loewis) * ![]() |
Date: 2006-06-30 16:01 | |
Logged In: YES user_id=21627 Ok, memory reduction would be a valid reason. How much memory is being preserved (say, for an empty program, or for an IDLE run)? I don't see the improved maintainability. Interning is inherently *not* a set operation, it's a dictionary operation: You are given a string, and need to return its interned version. That's a table lookup. In fact, the set API had to be changed (to add _PySet_InternString) to add a lookup operation. |
|||
msg50486 - (view) | Author: Alexander Belopolsky (belopolsky) * ![]() |
Date: 2006-06-30 20:33 | |
Logged In: YES user_id=835142 I did not measure the memory usage and I don't think such measurement will yield a compelling reason for the change. I suggested that a set is half of a dict, but that was wrong because dictentry is only 1.5 times larger than setentry because of the cached hash and depending on the size of the internet strings, large portion of the memory will be used by the string object themselves and that will not change with the patch. My primary reason is the simplicity (you can call it maintainability). I find the off by two reference count in the current dict based implementation rather unnatural and I think the dict that maps keys to themselves is a set. I disagree with Martin. I think interning is a set operation and it is unfortunate that set API does not support it directly. I guess the current view is that set is not a container. Although you can put things in a set and check if they are there, there is no API to get the things back. I would really like to see set C API to provide all the functionality of a dict with value=key, but it is also possible that I complitely misunderstood the need of the set object in the first place. |
|||
msg50487 - (view) | Author: Martin v. Löwis (loewis) * ![]() |
Date: 2007-03-05 12:40 | |
As nobody else is speaking in favour of this patch, I'm rejecting it. I disagree with Alexander's last remark in several respects: set is indeed a container, and there is a way to get the elements back (namely, by enumerating them, or picking an arbitrary element for removal). There is no operation to check, on insertion of E, what the the prior element equal to E was (if any); there is only a check to determine whether E is in the set. The operation "give me the member equal but not identical to E" is conceptually a lookup operation; the mathematical set construct has no such operation, and the Python set models it closely. IOW, set is *not* a dict with key==value. Without reviewing the patch again, I also doubt it is capable of getting rid of the reference count cheating: essentially, this cheating enables the interning dictionary to have weak references to strings, this is important to allow automatic collection of certain interned strings. This feature needs to be preserved, so the cheating in the reference count must continue. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:56:18 | admin | set | github: 43512 |
2013-10-09 18:06:20 | belopolsky | set | superseder: Use a set for interned strings |
2006-06-16 01:15:07 | belopolsky | create |