Message 218022 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	josh.r
Recipients	josh.r, vstinner
Date	2014-05-06.21:35:22
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1399412124.32.0.206189886441.issue21449@psf.upfronthosting.co.za>
In-reply-to

Content
_PyUnicode_CompareWithId is used exclusively for equality comparisons (after all, identifiers aren't really sortable in a meaningful way; they're isolated values, not a continuum). But because _PyUnicode_CompareWithId maintains the general comparison behavior, not just ==/!=, it serves little purpose; while it checks the return of _PyUnicode_FromId, none of its callers check for failure anyway, so every use could just as well have been: PyUnicode_Compare(left, _PyUnicode_FromId(right)); I've attached a patch that replaces _PyUnicode_CompareWithId with _PyUnicode_CompareWithIdEqual, that: 1. Only check equality vs. inequality 2. Can optimize for the case where left is an interned string by performing direct pointer comparison 3. Even when left is not interned, it can use the optimized unicode_compare_eq worker function instead of the slower generalized unicode_compare function I've replaced all the uses of the old function I could find, and all unit tests pass. I don't expect to see any meaningful speed ups as a result of the change (the most commonly traversed code that would benefit appears to be the code that creates new classes, and the code that creates reprs for objects), but the goal here is not immediate speed ups, but enabling future speed ups. I am looking into writing a PyDict_GetItem fastpath for looking up identifiers (that would remove the need to perform memory comparisons when the dictionary, as in keyword argument passing, is usually composed of interned keys), possibly in combination with making an identifier based version of PyArg_ParseTupleAndKeywords; with ArgumentClinic, it might become practical to swap in a new argument parser without having to manually change thousands of lines of code, and one of the simplest ways to improve speed would be to remove the overhead of constantly constructing, hashing, and comparing the same keyword strings every time a C function is called. Adding haypo as nosy since he created the original function in #19512.

_PyUnicode_CompareWithId is used exclusively for equality comparisons (after all, identifiers aren't really sortable in a meaningful way; they're isolated values, not a continuum). But because _PyUnicode_CompareWithId maintains the general comparison behavior, not just ==/!=, it serves little purpose; while it checks the return of _PyUnicode_FromId, none of its callers check for failure anyway, so every use could just as well have been:

PyUnicode_Compare(left, _PyUnicode_FromId(right));

I've attached a patch that replaces _PyUnicode_CompareWithId with _PyUnicode_CompareWithIdEqual, that:

1. Only check equality vs. inequality
2. Can optimize for the case where left is an interned string by performing direct pointer comparison
3. Even when left is not interned, it can use the optimized unicode_compare_eq worker function instead of the slower generalized unicode_compare function

I've replaced all the uses of the old function I could find, and all unit tests pass. I don't expect to see any meaningful speed ups as a result of the change (the most commonly traversed code that would benefit appears to be the code that creates new classes, and the code that creates reprs for objects), but the goal here is not immediate speed ups, but enabling future speed ups.

I am looking into writing a PyDict_GetItem fastpath for looking up identifiers (that would remove the need to perform memory comparisons when the dictionary, as in keyword argument passing, is usually composed of interned keys), possibly in combination with making an identifier based version of PyArg_ParseTupleAndKeywords; with ArgumentClinic, it might become practical to swap in a new argument parser without having to manually change thousands of lines of code, and one of the simplest ways to improve speed would be to remove the overhead of constantly constructing, hashing, and comparing the same keyword strings every time a C function is called.

Adding haypo as nosy since he created the original function in #19512.

History
Date	User	Action	Args
2014-05-06 21:35:24	josh.r	set	recipients: + josh.r, vstinner
2014-05-06 21:35:24	josh.r	set	messageid: <1399412124.32.0.206189886441.issue21449@psf.upfronthosting.co.za>
2014-05-06 21:35:24	josh.r	link	issue21449 messages
2014-05-06 21:35:23	josh.r	create