This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vstinner
Recipients christian.heimes, djc, ezio.melotti, gregory.p.smith, meador.inge, pitrou, rhettinger, serhiy.storchaka, vstinner
Date 2013-10-28.20:18:36
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1382991517.7.0.585939755161.issue16286@psf.upfronthosting.co.za>
In-reply-to
Content
Let's try to identify some use cases in the Python test suite using gdb:

(gdb) b unicode_compare_eq
(gdb) condition 1 ((PyASCIIObject*)str1)->hash != -1 &&  ((PyASCIIObject*)str2)->hash != -1 && ((PyASCIIObject*)str1)->hash != ((PyASCIIObject*)str2)->hash 
(gdb) run

I didn't dig to understand why hash of these strings are computed. Tell me if you need more examples.


Random examples:

(1) compare "constant" strings (strings from co_consts of code objects)

importlib._bootstrap: _setup():

    os_details = ('posix', ['/']), ('nt', ['\\', '/'])
    for builtin_os, path_separators in os_details:
        ...
    ...
    if builtin_os == 'nt': <== HERE
        ...


(2) importlib._bootstrap: _LoaderBasics.is_package()

    def is_package(self, fullname):
        filename = _path_split(self.get_filename(fullname))[1]
        filename_base = filename.rsplit('.', 1)[0]
        tail_name = fullname.rpartition('.')[2]
        return filename_base == '__init__' and ... <== HERE

It's surprising that filename_base has its hash computed. I suppose that all these functions (_path_split, .rsplit, .rpartition) return the string unmodified.


(3) importlib._bootstrap: PathFinder._path_importer_cache():

    @classmethod
    def _path_importer_cache(cls, path):
        ...
        if path == '': <== HERE

path is an entry of sys.path.


(4) str in __all__ (list of str):

os.py:

    if "putenv" not in __all__:
        __all__.append("putenv")

__all__ is a list of strings.


(5) site.py:

    if __name__ == '__main__': <== HERE

__name__ is 'site'.


(6) Python/ceval.py: PyEval_EvalCodeEx() called with arbitrary keyword

    for (i = 0; i < kwcount; i++) {
        PyObject **co_varnames;
        PyObject *keyword = kws[2*i];
        PyObject *value = kws[2*i + 1];
        int j;
        ...
        /* Speed hack: do raw pointer compares. As names are
           normally interned this should almost always hit. */
        co_varnames = ((PyTupleObject *)(co->co_varnames))->ob_item;
        for (j = 0; j < total_args; j++) {
            PyObject *nm = co_varnames[j];
            if (nm == keyword)
                goto kw_found;
        }
        /* Slow fallback, just in case */
        for (j = 0; j < total_args; j++) {
            PyObject *nm = co_varnames[j];
            int cmp = PyObject_RichCompareBool(  <== HERE
                keyword, nm, Py_EQ);
            if (cmp > 0)
                goto kw_found;
            else if (cmp < 0)
                goto fail;
        }

It looks like the "just in case" path is taken.

(gdb) where
#0  unicode_compare_eq (str1='isTest', str2='func') at Objects/unicodeobject.c:10532
#1  0x000000000052dd41 in PyUnicode_RichCompare (left='isTest', right='func', op=2) at Objects/unicodeobject.c:10609
#2  0x00000000004be4db in do_richcompare (v='isTest', w='func', op=2) at Objects/object.c:647
#3  0x00000000004be790 in PyObject_RichCompare (v='isTest', w='func', op=2) at Objects/object.c:696
#4  0x00000000004be832 in PyObject_RichCompareBool (v='isTest', w='func', op=2) at Objects/object.c:718
#5  0x00000000005a0f68 in PyEval_EvalCodeEx (...) at Python/ceval.c:3450
...

Traceback (most recent call first):
  File "/home/haypo/prog/python/default/Lib/test/test_xml_etree.py", line 1669, in test_get_keyword_args
    e1 = ET.Element('foo' , x=1, y=2, z=3)

ElementTree.Element() accepts arbitary keywords.


(7) letter==letter singletons:


xml.etree.ElementPath: iterfind()

def iterfind(elem, path, namespaces=None):
    ...
    if path[-1:] == "/": <== HERE

Traceback (most recent call first):
  File "/home/haypo/prog/python/default/Lib/xml/etree/ElementPath.py", line 254, in iterfind
    if path[-1:] == "/":

path is ".//grandchild", path[-1] is 'd' which is a singleton, Python already computed the hash of 'd'.


Similar example in the same file:

def xpath_tokenizer(pattern, namespaces=None):
    for token in xpath_tokenizer_re.findall(pattern):
        tag = token[1]
        if tag and tag[0] != "{" and ":" in tag: <== HERE
            ...

tag[0] != "{" <= tag is 'grandchild', tag[0] is a singleton.


Another example:

Traceback (most recent call first):
  File "/home/haypo/prog/python/default/Lib/sre_parse.py", line 194, in __next
    if char == "\\":


(8) str not in (list of str), test_descr.py: test_dir():

  File "/home/haypo/prog/python/default/Lib/test/test_descr.py", line 2255, in <listcomp>
    names = [x for x in dir(minstance) if x not in default_attributes]

        minstance = M("m")
        minstance.b = 2
        minstance.a = 1
        default_attributes = ['__name__', '__doc__', '__package__',
                              '__loader__']
        names = [x for x in dir(minstance) if x not in default_attributes]
History
Date User Action Args
2013-10-28 20:18:37vstinnersetrecipients: + vstinner, rhettinger, gregory.p.smith, pitrou, christian.heimes, djc, ezio.melotti, meador.inge, serhiy.storchaka
2013-10-28 20:18:37vstinnersetmessageid: <1382991517.7.0.585939755161.issue16286@psf.upfronthosting.co.za>
2013-10-28 20:18:37vstinnerlinkissue16286 messages
2013-10-28 20:18:36vstinnercreate