Message 201573 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	vstinner
Recipients	christian.heimes, djc, ezio.melotti, gregory.p.smith, meador.inge, pitrou, rhettinger, serhiy.storchaka, vstinner
Date	2013-10-28.20:18:36
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1382991517.7.0.585939755161.issue16286@psf.upfronthosting.co.za>
In-reply-to

Content
Let's try to identify some use cases in the Python test suite using gdb: (gdb) b unicode_compare_eq (gdb) condition 1 ((PyASCIIObject)str1)->hash != -1 && ((PyASCIIObject)str2)->hash != -1 && ((PyASCIIObject)str1)->hash != ((PyASCIIObject)str2)->hash (gdb) run I didn't dig to understand why hash of these strings are computed. Tell me if you need more examples. Random examples: (1) compare "constant" strings (strings from co_consts of code objects) importlib._bootstrap: _setup(): os_details = ('posix', ['/']), ('nt', ['\\', '/']) for builtin_os, path_separators in os_details: ... ... if builtin_os == 'nt': <== HERE ... (2) importlib._bootstrap: _LoaderBasics.is_package() def is_package(self, fullname): filename = _path_split(self.get_filename(fullname))[1] filename_base = filename.rsplit('.', 1)[0] tail_name = fullname.rpartition('.')[2] return filename_base == '__init__' and ... <== HERE It's surprising that filename_base has its hash computed. I suppose that all these functions (_path_split, .rsplit, .rpartition) return the string unmodified. (3) importlib._bootstrap: PathFinder._path_importer_cache(): @classmethod def _path_importer_cache(cls, path): ... if path == '': <== HERE path is an entry of sys.path. (4) str in __all__ (list of str): os.py: if "putenv" not in __all__: __all__.append("putenv") __all__ is a list of strings. (5) site.py: if __name__ == '__main__': <== HERE __name__ is 'site'. (6) Python/ceval.py: PyEval_EvalCodeEx() called with arbitrary keyword for (i = 0; i < kwcount; i++) { PyObject *co_varnames; PyObject keyword = kws[2i]; PyObject value = kws[2i + 1]; int j; ... / Speed hack: do raw pointer compares. As names are normally interned this should almost always hit. / co_varnames = ((PyTupleObject )(co->co_varnames))->ob_item; for (j = 0; j < total_args; j++) { PyObject nm = co_varnames[j]; if (nm == keyword) goto kw_found; } / Slow fallback, just in case / for (j = 0; j < total_args; j++) { PyObject nm = co_varnames[j]; int cmp = PyObject_RichCompareBool( <== HERE keyword, nm, Py_EQ); if (cmp > 0) goto kw_found; else if (cmp < 0) goto fail; } It looks like the "just in case" path is taken. (gdb) where #0 unicode_compare_eq (str1='isTest', str2='func') at Objects/unicodeobject.c:10532 #1 0x000000000052dd41 in PyUnicode_RichCompare (left='isTest', right='func', op=2) at Objects/unicodeobject.c:10609 #2 0x00000000004be4db in do_richcompare (v='isTest', w='func', op=2) at Objects/object.c:647 #3 0x00000000004be790 in PyObject_RichCompare (v='isTest', w='func', op=2) at Objects/object.c:696 #4 0x00000000004be832 in PyObject_RichCompareBool (v='isTest', w='func', op=2) at Objects/object.c:718 #5 0x00000000005a0f68 in PyEval_EvalCodeEx (...) at Python/ceval.c:3450 ... Traceback (most recent call first): File "/home/haypo/prog/python/default/Lib/test/test_xml_etree.py", line 1669, in test_get_keyword_args e1 = ET.Element('foo' , x=1, y=2, z=3) ElementTree.Element() accepts arbitary keywords. (7) letter==letter singletons: xml.etree.ElementPath: iterfind() def iterfind(elem, path, namespaces=None): ... if path[-1:] == "/": <== HERE Traceback (most recent call first): File "/home/haypo/prog/python/default/Lib/xml/etree/ElementPath.py", line 254, in iterfind if path[-1:] == "/": path is ".//grandchild", path[-1] is 'd' which is a singleton, Python already computed the hash of 'd'. Similar example in the same file: def xpath_tokenizer(pattern, namespaces=None): for token in xpath_tokenizer_re.findall(pattern): tag = token[1] if tag and tag[0] != "{" and ":" in tag: <== HERE ... tag[0] != "{" <= tag is 'grandchild', tag[0] is a singleton. Another example: Traceback (most recent call first): File "/home/haypo/prog/python/default/Lib/sre_parse.py", line 194, in __next if char == "\\": (8) str not in (list of str), test_descr.py: test_dir(): File "/home/haypo/prog/python/default/Lib/test/test_descr.py", line 2255, in <listcomp> names = [x for x in dir(minstance) if x not in default_attributes] minstance = M("m") minstance.b = 2 minstance.a = 1 default_attributes = ['__name__', '__doc__', '__package__', '__loader__'] names = [x for x in dir(minstance) if x not in default_attributes]

Let's try to identify some use cases in the Python test suite using gdb:

(gdb) b unicode_compare_eq
(gdb) condition 1 ((PyASCIIObject*)str1)->hash != -1 &&  ((PyASCIIObject*)str2)->hash != -1 && ((PyASCIIObject*)str1)->hash != ((PyASCIIObject*)str2)->hash 
(gdb) run

I didn't dig to understand why hash of these strings are computed. Tell me if you need more examples.


Random examples:

(1) compare "constant" strings (strings from co_consts of code objects)

importlib._bootstrap: _setup():

    os_details = ('posix', ['/']), ('nt', ['\\', '/'])
    for builtin_os, path_separators in os_details:
        ...
    ...
    if builtin_os == 'nt': <== HERE
        ...


(2) importlib._bootstrap: _LoaderBasics.is_package()

    def is_package(self, fullname):
        filename = _path_split(self.get_filename(fullname))[1]
        filename_base = filename.rsplit('.', 1)[0]
        tail_name = fullname.rpartition('.')[2]
        return filename_base == '__init__' and ... <== HERE

It's surprising that filename_base has its hash computed. I suppose that all these functions (_path_split, .rsplit, .rpartition) return the string unmodified.


(3) importlib._bootstrap: PathFinder._path_importer_cache():

    @classmethod
    def _path_importer_cache(cls, path):
        ...
        if path == '': <== HERE

path is an entry of sys.path.


(4) str in __all__ (list of str):

os.py:

    if "putenv" not in __all__:
        __all__.append("putenv")

__all__ is a list of strings.


(5) site.py:

    if __name__ == '__main__': <== HERE

__name__ is 'site'.


(6) Python/ceval.py: PyEval_EvalCodeEx() called with arbitrary keyword

    for (i = 0; i < kwcount; i++) {
        PyObject **co_varnames;
        PyObject *keyword = kws[2*i];
        PyObject *value = kws[2*i + 1];
        int j;
        ...
        /* Speed hack: do raw pointer compares. As names are
           normally interned this should almost always hit. */
        co_varnames = ((PyTupleObject *)(co->co_varnames))->ob_item;
        for (j = 0; j < total_args; j++) {
            PyObject *nm = co_varnames[j];
            if (nm == keyword)
                goto kw_found;
        }
        /* Slow fallback, just in case */
        for (j = 0; j < total_args; j++) {
            PyObject *nm = co_varnames[j];
            int cmp = PyObject_RichCompareBool(  <== HERE
                keyword, nm, Py_EQ);
            if (cmp > 0)
                goto kw_found;
            else if (cmp < 0)
                goto fail;
        }

It looks like the "just in case" path is taken.

(gdb) where
#0  unicode_compare_eq (str1='isTest', str2='func') at Objects/unicodeobject.c:10532
#1  0x000000000052dd41 in PyUnicode_RichCompare (left='isTest', right='func', op=2) at Objects/unicodeobject.c:10609
#2  0x00000000004be4db in do_richcompare (v='isTest', w='func', op=2) at Objects/object.c:647
#3  0x00000000004be790 in PyObject_RichCompare (v='isTest', w='func', op=2) at Objects/object.c:696
#4  0x00000000004be832 in PyObject_RichCompareBool (v='isTest', w='func', op=2) at Objects/object.c:718
#5  0x00000000005a0f68 in PyEval_EvalCodeEx (...) at Python/ceval.c:3450
...

Traceback (most recent call first):
  File "/home/haypo/prog/python/default/Lib/test/test_xml_etree.py", line 1669, in test_get_keyword_args
    e1 = ET.Element('foo' , x=1, y=2, z=3)

ElementTree.Element() accepts arbitary keywords.


(7) letter==letter singletons:


xml.etree.ElementPath: iterfind()

def iterfind(elem, path, namespaces=None):
    ...
    if path[-1:] == "/": <== HERE

Traceback (most recent call first):
  File "/home/haypo/prog/python/default/Lib/xml/etree/ElementPath.py", line 254, in iterfind
    if path[-1:] == "/":

path is ".//grandchild", path[-1] is 'd' which is a singleton, Python already computed the hash of 'd'.


Similar example in the same file:

def xpath_tokenizer(pattern, namespaces=None):
    for token in xpath_tokenizer_re.findall(pattern):
        tag = token[1]
        if tag and tag[0] != "{" and ":" in tag: <== HERE
            ...

tag[0] != "{" <= tag is 'grandchild', tag[0] is a singleton.


Another example:

Traceback (most recent call first):
  File "/home/haypo/prog/python/default/Lib/sre_parse.py", line 194, in __next
    if char == "\\":


(8) str not in (list of str), test_descr.py: test_dir():

  File "/home/haypo/prog/python/default/Lib/test/test_descr.py", line 2255, in <listcomp>
    names = [x for x in dir(minstance) if x not in default_attributes]

        minstance = M("m")
        minstance.b = 2
        minstance.a = 1
        default_attributes = ['__name__', '__doc__', '__package__',
                              '__loader__']
        names = [x for x in dir(minstance) if x not in default_attributes]

History
Date	User	Action	Args
2013-10-28 20:18:37	vstinner	set	recipients: + vstinner, rhettinger, gregory.p.smith, pitrou, christian.heimes, djc, ezio.melotti, meador.inge, serhiy.storchaka
2013-10-28 20:18:37	vstinner	set	messageid: <1382991517.7.0.585939755161.issue16286@psf.upfronthosting.co.za>
2013-10-28 20:18:37	vstinner	link	issue16286 messages
2013-10-28 20:18:36	vstinner	create