This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Handle requests to intern string subtype instances
Type: enhancement Stage: resolved
Components: Interpreter Core Versions:
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: ajaksu2, arigo, hniksic
Priority: normal Keywords: patch

Created on 2007-02-13 10:34 by hniksic, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
subtype-intern-patch hniksic, 2007-02-13 10:34
Messages (7)
msg51863 - (view) Author: Hrvoje Nikšić (hniksic) * Date: 2007-02-13 10:34
This patch implements a small modification of PyString_InternInPlace that allows for safe interning of string subtype instances.  The change should be fully backward compatible -- for a rationale and discussion, see:

http://mail.python.org/pipermail/python-dev/2007-February/070973.html
msg51864 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2007-02-13 22:36
I think that this opens an attack (untested, though): it allows a previously-interned string to be removed from the dictionary.  This might lead to a crash because the old string still thinks it is interned.  Try something along the lines of:

s1 = "hel"
s1 = intern(s1 + "lo")

class S(str):
    def __hash__(self):
        return hash(s1)
    def __eq__(self, other):
        return other == s1

s = S("world")
intern(s)
del s1
msg51865 - (view) Author: Hrvoje Nikšić (hniksic) * Date: 2007-02-14 08:31
I don't think an attack is possible.  This patch retains the property that only exact strings are interned.  If you create a pathological string subtype that hashes like a different string instance (one that has already been interned), all you'll achieve is that "interning" will return the other instance.  As far as I can tell, no string is actually removed from the interned dictionary (until it becomes unreachable, that is.)

What is the expected result of your test code?  I tried it and it ran without error.
msg51866 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2007-02-14 12:13
Ah, the code was the wrong way around.  The following
causes an Fatal Python error in a debug build:

s1 = "hel"
s1 = intern(s1 + "lo")

class S(str):
    def __hash__(self):
        return 0
    def __eq__(self, other):
        return False

s = S(s1)
s2 = intern(s)
del s1
msg51867 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2007-02-14 12:21
Btw, any reason why you cannot simply say in
your Python program:  intern(str(s)) ?
msg51868 - (view) Author: Hrvoje Nikšić (hniksic) * Date: 2007-03-20 14:25
Thanks for the detailed analysis.  I missed this case and I retract the patch until I think of a way to resolve this problem.  The obvious possibility is to always copy subtype instances before attempting to intern them, but right now I don't have the time to investigate if this slows things down.

As for using s = intern(str(s)) in Python, it's a start, but it does somewhat more than I'd like -- for example, it "interns" all kinds of objects, which is not desirable.  My patch was written with the intention of making PyString_InternInPlace more robust wrt string subtype instances, so that all the code in core and extensions that simply calls PyString_InternInPlace keeps working without modification.

In the long run, the interface of PyString_InternInPlace is a bit too undeterministic for my taste.  It has no error reporting, so it silently ignores some kinds of errors (not enough memory), throws fatal error on others (non-string being passed), and also completely ignores string subtypes.  In my code I use this utility function:

int
intern_force(PyObject **s)
{
  if (PyString_CheckExact(*s))
    /* Most likely case: we're passed an exact string. */
    PyString_InternInPlace(s);
  else if (PyString_Check(*s))
    {
      /* The case we're covering with this function: we got a string
         subtype.  Intern a copy.  */
      PyObject *copy;
      copy = PyString_FromStringAndSize(PyString_AS_STRING(*s),
                                        PyString_GET_SIZE(*s));
      if (!copy)
        return -1;
      Py_DECREF(*s);
      *s = copy;
      PyString_InternInPlace(s);
    }
  else
    {
      PyErr_SetString(PyExc_TypeError, "intern_force passed a non-string");
      return -1;
    }

  if (!PyString_CHECK_INTERNED(s))
    {
      /* PyString_InternInPlace failed and cleared the exception, most
         likely due to insufficient memory. */
      PyErr_Format(PyExc_RuntimeError, "failed to intern string '%s'",
                   PyString_AS_STRING(*s));
      return -1;
    }
  return 0;
}

I don't expect a function like this one to become a part of Python, but PyString_InternInPlace could be usefully improved even without breaking compatibility.
msg84651 - (view) Author: Daniel Diniz (ajaksu2) * (Python triager) Date: 2009-03-30 21:24
Closing as the patch was retracted. Please reopen if you disagree.
History
Date User Action Args
2022-04-11 14:56:22adminsetgithub: 44576
2009-03-30 21:24:51ajaksu2setstatus: open -> closed

type: enhancement

nosy: + ajaksu2
messages: + msg84651
resolution: not a bug
stage: resolved
2007-02-13 10:34:01hniksiccreate