Issue576101
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2002-07-01 19:23 by orenti, last changed 2022-04-10 16:05 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
altintern.patch | orenti, 2002-07-01 19:23 | |||
altintern2.patch | orenti, 2002-07-06 16:08 | |||
altintern4.patch | orenti, 2002-08-10 10:01 | |||
altintern5.patch | gvanrossum, 2002-08-14 18:43 | updated to CVS as of 8/14, plus API_VERSION | ||
altintern6.patch | gvanrossum, 2002-08-16 18:57 | New version, mostly mortal | ||
altintern7.patch | gvanrossum, 2002-08-19 21:44 |
Messages (14) | |||
---|---|---|---|
msg40458 - (view) | Author: Oren Tirosh (orenti) | Date: 2002-07-01 19:23 | |
An interned string has a flag set indicating that it is interned instead of a pointer to the interned string. This pointer was almost always either NULL or pointing to the same object. The other cases were rare and ineffective as an optimization. This saves an average of 3 bytes per string. Interned strings are no longer immortal. They are automatically destroyed when there are no more references to them except the global dictionary of interned strings. New function (actually a macro) PyString_CheckInterned to check whether a string is interned. There are no more references to ob_sinterned anywhere outside stringobject.c. |
|||
msg40459 - (view) | Author: Raymond Hettinger (rhettinger) * ![]() |
Date: 2002-07-02 04:21 | |
Logged In: YES user_id=80475 I like the way you consolidated all of the knowledge about interning into one place. Consider adding an example to the docs of an effective use of interning for optimization. |
|||
msg40460 - (view) | Author: Oren Tirosh (orenti) | Date: 2002-07-06 14:35 | |
Logged In: YES user_id=562624 This implementation supports both mortal and immortal interned strings. PyString_InternInPlace creates an immortal interned string for backward compatibility with code that relies on this behavior. PyString_Intern creates a mortal interned string that is deallocated when its refcnt reaches 0. Note that if the string value has been previously interned as immortal this will not make it mortal. Most places in the interpreter were changed to PyString_Intern except those that may be required for compatibility. This version of the patch, like the previous one, disables indirect interning. Is there any evidence that it is still an important optimization for some packages? Make sure you rebuild everything after applying this patch because it modifies the size of string object headers. |
|||
msg40461 - (view) | Author: Oren Tirosh (orenti) | Date: 2002-07-06 16:08 | |
Logged In: YES user_id=562624 Oops, forgot to actually attach the patch. Here it is. |
|||
msg40462 - (view) | Author: Oren Tirosh (orenti) | Date: 2002-08-10 10:01 | |
Logged In: YES user_id=562624 General cleanup. Better handling of immortal interned strings for backward compatibility. It passes regrtest but causes test_gc to leak 20 objects. 13 from test_finalizer_newclass and 7 from test_del_newclass, but only if test_saveall is used. I've tried earlier versions of this patch (which were ok at the time) and they now create this leak too. |
|||
msg40463 - (view) | Author: Guido van Rossum (gvanrossum) * ![]() |
Date: 2002-08-14 18:43 | |
Logged In: YES user_id=6380 Here's an update of the patch for current CVS (stringobject.h failed due to changes for PyAPI_DATA/PyAPI_FUNC). Could you add documentation to Doc/api/concrete.tex for PyString_Intern() and explains how PyString_InternInPlace() differs? (AFAICT it makes the interned string immortal -- I suppose this is a B/W compat feature?) The variables PYTHON_API_VERSION and PYTHON_API_STRING in modsupport.h need to be updated -- many extensions use the PyString_AS_STRING() macro which relies on the string object format. If an extension compiled with the old code is linked with the new interpreter, it will miss the first three bytes of string objects -- or even store into memory it doesn't own! (I've already added this to the patch I am uploading.) (The test_gc failures were unrelated; Tim has fixed this already in CVS.) I'm tempted to say that except for the API doc issue this is complete. Thanks! |
|||
msg40464 - (view) | Author: Guido van Rossum (gvanrossum) * ![]() |
Date: 2002-08-14 19:32 | |
Logged In: YES user_id=6380 Question for all other reviewers. Why not replace all calls to PyString_InternInplace() [which creates immortal strings] with PyString_Intern(), making all (core) uses of interning yield mortal strings? E.g. the call in PyObject_SetAttr() will immortalize all strings that are ever used as a key on a setattr operation; in a long-lived server like Zope this is a concern, since setattr keys are often user-provided data: an endless stream of user-provided data will grow the interned dict indefinitely. And having the builtin intern() always return an immortal string also limits the usability of intern(). Most of the uses I could find of PyString_InternFromString() hold on to a global ref to the object, making it immortal anyway; but why should that function itself force the string to be immortal? (Especially since the exceptions are things like PyObject_GetAttrString() and PyObject_SetItemString(), which have the same concerns as PyObject_SetItem(). |
|||
msg40465 - (view) | Author: Martin v. Löwis (loewis) * ![]() |
Date: 2002-08-14 22:24 | |
Logged In: YES user_id=21627 Some mutually unrelated comments: - the GC_UnTrack call for interned is not need: GC won't be able to explain the reference that stringobject.c holds. - why does this try to "fix" the problem of dangling interned strings? AFAICT: if there is a reference to an interned string at the time _Py_ReleaseInternedStrings is called, that reference is silently dropped, and a later DECREF will result in memory corruption. IOW: it should merely set the state of all strings to normal, and clear the dict. - Replacing PyString_InternInPlace with PyString_Intern seems dangerous. AFAICT, the fragment PyString_InternInPlace(&name); Py_DECREF(name); return PyString_AS_STRING(name); from getclassname would break: Intern() would return the only reference to the interned string (assuming this is the first usage), and getclassname drops this reference, returning a pointer to deallocated memory. I'm not sure though why getclassname interns the result in the first place. Selectively replacing them might be a good idea, though. For intern(), I think an optional argument strongref needs to be provided (the interned dict essentially weak-references the strings). Perhaps the default even needs to be weakref. |
|||
msg40466 - (view) | Author: Guido van Rossum (gvanrossum) * ![]() |
Date: 2002-08-15 01:26 | |
Logged In: YES user_id=6380 > - why does this try to "fix" the problem of > dangling interned strings? AFAICT: if there is a > reference to an interned string at the time > _Py_ReleaseInternedStrings is called, that > reference is silently dropped, and a later > DECREF will result in memory corruption. IOW: it > should merely set the state of all strings to > normal, and clear the dict. Note that the *only* time when _Py_ReleaseInternedStrings() can ever be called is at program exit, just before you run a memory leak detector. There's no way Python can be resurrected after _Py_ReleaseInternedStrings() has run. > - Replacing PyString_InternInPlace with > PyString_Intern seems dangerous. AFAICT, the > fragment > > PyString_InternInPlace(&name); > Py_DECREF(name); > return PyString_AS_STRING(name); > > from getclassname would break: Intern() would > return the only reference to the interned string > (assuming this is the first usage), and > getclassname drops this reference, returning a > pointer to deallocated memory. I'm not sure > though why getclassname interns the result in > the first place. getclassname() is doing something very unsavory here! I expect that its API will have to be changed to copy the name into a buffer provided by the caller. We'll have to scrutinize all calls for tricks like this. > Selectively replacing them might be a good idea, > though. For intern(), I think an optional > argument strongref needs to be provided (the > interned dict essentially weak-references the > strings). Perhaps the default even needs to be > weakref. So do you think there's a need for immortal strings? What is that need? |
|||
msg40467 - (view) | Author: Guido van Rossum (gvanrossum) * ![]() |
Date: 2002-08-15 03:03 | |
Logged In: YES user_id=6380 string_dealloc() is a bit optimistic in that it doesn't check the DelItem for an error; but I don't know what it should do when it gets an error at that point. Probably call Py_FatalError(); if it wanted to recover, it would have to call PyErr_Fetch() / PyErr_Restore() around the DelItem() call, because we're in a dealloc handler here and that shouldn't change the exception state. _Py_ReleaseInternedStrings() should use PyDict_ methods, not PyMapping_ methods. And it should do more careful error checking. But maybe it's best to delete this function -- it's not needed except when you want to run Insure++, and we're not using that any more. I note that the whole patch needs to be scrutinized carefully looking for missing error checking and things like that. |
|||
msg40468 - (view) | Author: Oren Tirosh (orenti) | Date: 2002-08-15 03:44 | |
Logged In: YES user_id=562624 Yes, PyString_InternInPlace is for backward compatibility. How conservative do we need to be about compatibility? My work copy has an option for making strings binary compatible. Which is more important: binary compatibility or saving 3 bytes? A related patch (static names) provides a possible alternative to most PyString_InternFromString calls. |
|||
msg40469 - (view) | Author: Martin v. Löwis (loewis) * ![]() |
Date: 2002-08-15 08:45 | |
Logged In: YES user_id=21627 _Py_ReleaseInternedStrings: it might be that embedded applications use it. It would not be fair to cause heap corruption for them - it would be better to break them at link time, by removing the function entirely. I see no need to do either - it should just release immortal strings, as it always did, if there are any left. intern creates immortal strings: It might be that an application saves the id() of an interned string and releases the interned strings; then expects to get the same id back later. If you ask people whether they do that they won't tell, because they don't know that they do that. You could explicitly decide to break such applications (which would be reasonable), but then this needs to be documented. binary compatibility: I'm neutral here. If the API is bumped, people get sufficient warning. PyString_InternInPlace: I think it needs to be preserved, since applications may not hold explicit references (trusting that the interned dictionary will hold the reference). Of course, the InPlace name signals that there is no return value, so it is better than _Intern for new users. |
|||
msg40470 - (view) | Author: Guido van Rossum (gvanrossum) * ![]() |
Date: 2002-08-16 18:57 | |
Logged In: YES user_id=6380 Here's a new version (#6) that makes all interned strings mortal unless explicitly requested with PyString_InternImmortal(). There are no calls to that function in the core. I'm very tempted to check this in and see how it goes. - Leave all the calls to PyString_InternInPlace(), since that is still the recommended API. - Got rid of the macro PyString_INTERN(), it was unused. - Fixed the issue with getclassname() through an API change (it's static so doesn't matter). - Rewrote _Py_ReleaseInternedStrings(); it now simply clears the immortality status, restores the stolen refcounts, and then clears and decrefs the interned dict. |
|||
msg40471 - (view) | Author: Guido van Rossum (gvanrossum) * ![]() |
Date: 2002-08-19 21:44 | |
Logged In: YES user_id=6380 All checked in. I'm uploading the final version of the patch, since #6 contained some garbage (Emacs does something bizarre when you edit a patch file). |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-10 16:05:28 | admin | set | github: 36836 |
2002-07-01 19:23:00 | orenti | create |