msg58256 - (view) |
Author: Raymond Hettinger (rhettinger) * |
Date: 2007-12-06 21:34 |
Much of the code in setobject.c exactly parallels that in dictobject.c.
Ideally, we should keep that parallelism by scanning all of the changes
to dictobject.c and applying substantially similar changes to
setobject.c (just the changes that touch the hash tables, not the API
changes).
|
msg58257 - (view) |
Author: Guido van Rossum (gvanrossum) * |
Date: 2007-12-06 22:10 |
Would it make any sense at all to refactor the code so that code reuse
is automatic?
|
msg58259 - (view) |
Author: Raymond Hettinger (rhettinger) * |
Date: 2007-12-07 01:30 |
Not really, the implementations are different enough that it would be
*really* hard to keep common code. The two parallel each other in a way
that is visually easy to translate but hard to do through real
refactoring. For the most part, both code bases have historically been
very stable, so double maintenance hasn't been much of an issue. My
last post was meant to suggest the cleanest way to do the Py3.0
string-->unicode switch modeling the set changes after those in dicts.
FWIW, the reason that I put effort in to keeping the code as parallel as
possible was that it would save us mental clock cycles -- understanding
one implementation gives you most of the other for free.
|
msg58260 - (view) |
Author: Raymond Hettinger (rhettinger) * |
Date: 2007-12-07 01:31 |
P.S. There is a way to factor-out some common code but it would entail
introducing a bunch of macros that hide the differences between the two.
I don't think it would be worth it.
|
msg58311 - (view) |
Author: Christian Heimes (christian.heimes) * |
Date: 2007-12-09 01:05 |
I've created a patch that adds optimization for PyUnicode while keeping
the existing optimization for PyString. The patch moves the optimization
trick for PyObject_Hash() into a macro and adds an optimized
_PyUnicode_Eq() to unicodeobject.c
|
msg58318 - (view) |
Author: Christian Heimes (christian.heimes) * |
Date: 2007-12-09 15:25 |
I fixed a bug in the last patch. It now works mixed sets with str and
bytes but it doesn't optimize bytes in set_lookup() any more.
|
msg58320 - (view) |
Author: Raymond Hettinger (rhettinger) * |
Date: 2007-12-09 17:15 |
The patch doesn't parallel what was done for dicts. The code in
dictobject.c does not use a macro. It special cases for PyUnicode but
not PyString. Please submit a patch that mirrors what was done for
dicts.
|
msg58328 - (view) |
Author: Christian Heimes (christian.heimes) * |
Date: 2007-12-09 20:51 |
Updates:
* Moved dictobject.c:unicode_eq() to unicodeobject.c:_PyUnicode_Eq()
* Added another optimization step to _PyUnicode_Eq(). The hash is
required later anyway and comparing two hashes is much faster than
memcmp-ing the unicode objects.
if (unicode_hash(a) != unicode_hash(b))
return 0;
* Factored out the ((PyUnicodeObject *) v)->hash optimization into a
function object.c:_PyObject_HashFast() which does the trick for
PyUnicode and PyString. The trick was used a couple of times in
dictobject.c and setobject.c. We may even think about moving the trick
to PyObject_Hash() directly.
|
msg58331 - (view) |
Author: Raymond Hettinger (rhettinger) * |
Date: 2007-12-09 22:39 |
Which is the common case in Py3k, to have strings or unicode? By
trying to catch both, you slow down the optimization. Also, the
new "hash_fast" introduces function call overhead in previously in-
lined code.
My preference is to knock-out the optimization or leave in the way it
is. Remember, strings already cache their hash codes and that
optimization is being short-circuited here.
For sets, I prefer the code to be left as it is. For dicts, whatever
you do, don't slow-down the common case of regular attribute lookup.
This is some of the most time critical code in the language. Trying to
generalize the optimization, make actually make it slower.
If unicode strings are the norm and an not PyString_Objects, then
switch to that one, but I don't think you should try to do both.
|
msg58333 - (view) |
Author: Christian Heimes (christian.heimes) * |
Date: 2007-12-10 00:48 |
Raymond Hettinger wrote:
> Which is the common case in Py3k, to have strings or unicode? By
> trying to catch both, you slow down the optimization. Also, the
> new "hash_fast" introduces function call overhead in previously in-
> lined code.
PyUnicode is the common case for dict keys in py3k. Attributes etc. are
all PyUnicode instances.
> For sets, I prefer the code to be left as it is. For dicts, whatever
> you do, don't slow-down the common case of regular attribute lookup.
> This is some of the most time critical code in the language. Trying to
> generalize the optimization, make actually make it slower.
I wonder how I should store unicode_eq. It's required for dict and set
optimization and I like to keep it static inline. But I can't access
unicode_eq in setobject.c when it is declared as static in dictobject.c.
I could either declare unicode_eq as extern or I could put the method
into a header file which is included by both c files.
> If unicode strings are the norm and an not PyString_Objects, then
> switch to that one, but I don't think you should try to do both.
Understood!
Christian
|
msg58335 - (view) |
Author: Raymond Hettinger (rhettinger) * |
Date: 2007-12-10 01:42 |
Okay, then simply go back to the unaltered existing code and replace
references to PyString with PyUnicode. Skip all the macros, new
functions, factorings, etc. Just change strings to unicode and be done
with it. IIRC, that was what was done for dicts. Just do the same for
sets and not try to get tricky with a new, misnamed C API function like
PyObject_Hash().
|
msg58337 - (view) |
Author: Christian Heimes (christian.heimes) * |
Date: 2007-12-10 02:05 |
The latest patch does *NOT* add new macros, functions or other stuff. I
simply replaced PyString_* with PyUnicode_* in setobject.c where
appropriate.
The only function I had to factor out is unicode_eq(). It's now in a new
file stringlib/eq.h which is included by setobject.c and dictobject.c.
It's the only way to share the function while keeping it static and
maybe inline.
|
msg58346 - (view) |
Author: Raymond Hettinger (rhettinger) * |
Date: 2007-12-10 15:11 |
I had looked at the version 2 instead of version 3.
Version 3 is much closer. A couple of comments. Don't change the
brace opening/closing convention in the file -- stick with the K&R
style -- mixing two different styles makes the file harder to read and
maintain.
Also, do not introduce string optimization in new places. If it wasn't
there before, it was likely left out for a reason (lines 1231 and 1352
for example).
|
msg58351 - (view) |
Author: Christian Heimes (christian.heimes) * |
Date: 2007-12-10 15:52 |
I've done as you said and committed the changes in r59449.
Next time I won't try to add optimizations without consulting you in the
first place. :] Thanks for your advice.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:56:28 | admin | set | github: 45905 |
2008-01-06 22:29:44 | admin | set | keywords:
- py3k versions:
Python 3.0 |
2007-12-10 15:52:35 | christian.heimes | set | status: open -> closed resolution: fixed messages:
+ msg58351 |
2007-12-10 15:11:03 | rhettinger | set | messages:
+ msg58346 |
2007-12-10 02:05:14 | christian.heimes | set | messages:
+ msg58337 |
2007-12-10 01:42:32 | rhettinger | set | messages:
+ msg58335 |
2007-12-10 01:05:50 | christian.heimes | set | files:
+ py3k_optimize_set_unicode3.patch |
2007-12-10 00:48:22 | christian.heimes | set | messages:
+ msg58333 |
2007-12-09 22:39:11 | rhettinger | set | messages:
+ msg58331 |
2007-12-09 20:51:48 | christian.heimes | set | files:
- py3k_optimize_set_unicode.patch |
2007-12-09 20:51:42 | christian.heimes | set | files:
+ py3k_optimize_set_unicode2.patch messages:
+ msg58328 |
2007-12-09 17:15:33 | rhettinger | set | messages:
+ msg58320 |
2007-12-09 15:25:56 | christian.heimes | set | files:
+ py3k_optimize_set_unicode.patch messages:
+ msg58318 |
2007-12-09 15:23:53 | christian.heimes | set | files:
- py3k_optimize_set_unicode.patch |
2007-12-09 01:05:57 | christian.heimes | set | priority: normal assignee: christian.heimes messages:
+ msg58311 files:
+ py3k_optimize_set_unicode.patch nosy:
+ christian.heimes |
2007-12-07 01:31:35 | rhettinger | set | messages:
+ msg58260 |
2007-12-07 01:30:01 | rhettinger | set | messages:
+ msg58259 |
2007-12-06 22:10:44 | gvanrossum | set | messages:
+ msg58257 |
2007-12-06 21:34:07 | rhettinger | set | nosy:
+ rhettinger messages:
+ msg58256 |
2007-12-06 21:19:11 | gvanrossum | create | |