msg156085 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2012-03-16 21:09 |
This patch slightly optimize conversion int to string with base 2, 8 or 16 by calculating exactly number of digits and avoiding unnecessary memory allocation. The code is similar to used for decimal conversion.
Without patch:
$ ./python -m timeit -s 'x=1' 'bin(x)'
1000000 loops, best of 3: 0.295 usec per loop
$ ./python -m timeit -s 'x=1' 'oct(x)'
1000000 loops, best of 3: 0.327 usec per loop
$ ./python -m timeit -s 'x=1' 'hex(x)'
1000000 loops, best of 3: 0.298 usec per loop
$ ./python -m timeit -s 'x=9**999' 'bin(x)'
100000 loops, best of 3: 11.9 usec per loop
$ ./python -m timeit -s 'x=9**999' 'oct(x)'
100000 loops, best of 3: 4.55 usec per loop
$ ./python -m timeit -s 'x=9**999' 'hex(x)'
100000 loops, best of 3: 3.99 usec per loop
With patch:
$ ./python -m timeit -s 'x=1' 'bin(x)'
1000000 loops, best of 3: 0.213 usec per loop
$ ./python -m timeit -s 'x=1' 'oct(x)'
1000000 loops, best of 3: 0.228 usec per loop
$ ./python -m timeit -s 'x=1' 'hex(x)'
1000000 loops, best of 3: 0.215 usec per loop
$ ./python -m timeit -s 'x=9**999' 'bin(x)'
100000 loops, best of 3: 9.76 usec per loop
$ ./python -m timeit -s 'x=9**999' 'oct(x)'
100000 loops, best of 3: 3.58 usec per loop
$ ./python -m timeit -s 'x=9**999' 'hex(x)'
100000 loops, best of 3: 3.3 usec per loop
|
msg157234 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2012-03-31 22:54 |
While optimizing stuff is nice, I'm not sure there's a use case here. Does it significantly speed up a real-world workload you're having?
|
msg157260 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2012-04-01 07:10 |
No, I do not think that any application will significantly speeded up.
In fact, it is not optimization, and getting rid of the apparent pessimization. In addition to increasing the speed, memory fragmentation is reduced.
The patch has a minimum size, the new code is not larger and not more complex than the old one, it is similar to code already used in this file, there are small nonconditional advantages. I see no reason not to accept the changes.
|
msg158351 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2012-04-15 18:48 |
A few comments:
(1) The patch appears to assume that a Unicode string created with PyUnicode_New(size, 127) will have 'kind' PyUnicode_1BYTE_KIND. While this might be true in the current implementation, I don't know whether this is guaranteed in general. Martin, any comments on this?
(2) The patch doesn't compile with '--with-pydebug': there's a reference to the (now) undefined variable 'buffer' in one of the asserts.
(3) The overflow check looks as though it needs to be reworked.
|
msg158366 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2012-04-15 20:32 |
> (1) The patch appears to assume that a Unicode string created with PyUnicode_New(size, 127) will have 'kind' PyUnicode_1BYTE_KIND. While this might be true in the current implementation, I don't know whether this is guaranteed in general. Martin, any comments on this?
The same assumption is used above in long_to_decimal_string(). I added
the same assert and changed maxchar to 'f'.
> (2) The patch doesn't compile with '--with-pydebug': there's a reference to the (now) undefined variable 'buffer' in one of the asserts.
Fixed.
> (3) The overflow check looks as though it needs to be reworked.
I was left an old constraint (it is enough), but it really can be
weakened (in fact, it can be so weakened in the current code).
Thank you for the comments and the found error.
|
msg158402 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2012-04-16 07:14 |
Thanks for the updates.
> The same assumption is used above in long_to_decimal_string(). I added
> the same assert and changed maxchar to 'f'.
I think 'x' would be more appropriate. :-)
|
msg158405 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2012-04-16 09:02 |
> I think 'x' would be more appropriate. :-)
Oops. You are right, of cause.
|
msg158441 - (view) |
Author: Martin v. Löwis (loewis) * |
Date: 2012-04-16 13:25 |
> (1) The patch appears to assume that a Unicode string created with
> PyUnicode_New(size, 127) will have 'kind' PyUnicode_1BYTE_KIND.
> While this might be true in the current implementation, I don't know
> whether this is guaranteed in general. Martin, any comments on
> this?
There is a guarantee that the shortest form must always be used.
So as long as the Unicode representation doesn't fundamentally change
again, this property is indeed guaranteed.
|
msg158445 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2012-04-16 13:38 |
> There is a guarantee that the shortest form must always be used.
Okay, sounds good. Thanks.
|
msg158881 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2012-04-20 20:21 |
New changeset dcd3344b6d5b by Mark Dickinson in branch 'default':
Issue #14339: Improve speed of bin, oct and hex builtins. Patch by Serhiy Storchaka (with minor modifications).
http://hg.python.org/cpython/rev/dcd3344b6d5b
|
msg158882 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2012-04-20 20:23 |
Thanks for the patch. I made some minor changes, notably moving the overflow check closer to where it's needed, moving some comments around, and removing a (possibly inappropriate) PyErr_NoMemory call.
|
msg158884 - (view) |
Author: Martin v. Löwis (loewis) * |
Date: 2012-04-20 20:41 |
This should have waited until Serhiy submits a contributor form. Serhiy, can you please do this soon? Else I'll have to revert the change.
|
msg158890 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2012-04-20 20:49 |
Aargh. Sorry, yes. Serhiy, can you do this? The form can be found at:
http://www.python.org/psf/contrib/contrib-form-python/
|
msg158891 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2012-04-20 20:52 |
Whoops; that looks like a slightly older version of the form. I think the correct one is:
http://www.python.org/psf/contrib/contrib-form/
|
msg159076 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2012-04-23 21:00 |
I can only do this tomorrow.
|
msg159220 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2012-04-24 22:06 |
2.100 - v = PyUnicode_DecodeASCII(p, &buffer[sz] - p, NULL);
...
2.104 + assert(p == PyUnicode_1BYTE_DATA(v));
2.105 return v;
2.106 }
You may call assert(_PyUnicode_CheckConsistency(v, 1)) to ensure that the newly created string is "consistent" (see the function for the details).
|
msg159440 - (view) |
Author: Martin v. Löwis (loewis) * |
Date: 2012-04-27 06:33 |
Serhiy: what's the status of your contributor form? Note that you can also fax it, or scan it and send it by email.
|
msg159449 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2012-04-27 11:19 |
> You may call assert(_PyUnicode_CheckConsistency(v, 1)) to ensure
> that the newly created string is "consistent" (see the function
> for the details).
Done in the following changeset:
changeset: 76560:3bdcf0cab164
parent: 76558:5fea362b92fc
user: Victor Stinner <victor.stinner@gmail.com>
date: Thu Apr 26 00:37:21 2012 +0200
files: Objects/longobject.c
description:
long_to_decimal_string() and _PyLong_Format() check the consistency of newly
created strings using _PyUnicode_CheckConsistency() in debug mode
|
msg159463 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2012-04-27 14:52 |
I sent the signed form. Martin, sorry for the delay. Mark, sorry, that
involuntarily let you down.
|
msg159498 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2012-04-27 21:51 |
Serhiy: thanks for submitting the form! No need to apologise: it was my bad for making the commit prematurely.
And now that I see that all-important asterisk next to your name, I'll reclose the issue. :-)
|
|
Date |
User |
Action |
Args |
2022-04-11 14:57:28 | admin | set | github: 58547 |
2012-04-27 21:51:23 | mark.dickinson | set | status: open -> closed
messages:
+ msg159498 |
2012-04-27 14:52:33 | serhiy.storchaka | set | messages:
+ msg159463 |
2012-04-27 11:19:54 | vstinner | set | messages:
+ msg159449 |
2012-04-27 06:33:04 | loewis | set | messages:
+ msg159440 |
2012-04-24 22:06:27 | vstinner | set | nosy:
+ vstinner messages:
+ msg159220
|
2012-04-23 21:00:53 | serhiy.storchaka | set | messages:
+ msg159076 |
2012-04-20 20:52:55 | mark.dickinson | set | messages:
+ msg158891 |
2012-04-20 20:49:55 | mark.dickinson | set | messages:
+ msg158890 |
2012-04-20 20:41:08 | loewis | set | status: closed -> open
messages:
+ msg158884 |
2012-04-20 20:23:27 | mark.dickinson | set | status: open -> closed resolution: fixed messages:
+ msg158882
|
2012-04-20 20:21:46 | python-dev | set | nosy:
+ python-dev messages:
+ msg158881
|
2012-04-17 11:16:22 | mark.dickinson | set | assignee: mark.dickinson |
2012-04-16 13:38:49 | mark.dickinson | set | messages:
+ msg158445 |
2012-04-16 13:25:25 | loewis | set | messages:
+ msg158441 |
2012-04-16 09:02:31 | serhiy.storchaka | set | files:
+ long_to_binary_base_3.patch
messages:
+ msg158405 |
2012-04-16 07:14:11 | mark.dickinson | set | messages:
+ msg158402 |
2012-04-15 20:32:11 | serhiy.storchaka | set | files:
+ long_to_binary_base_2.patch
messages:
+ msg158366 |
2012-04-15 18:48:12 | mark.dickinson | set | nosy:
+ loewis messages:
+ msg158351
|
2012-04-14 04:28:50 | ezio.melotti | set | nosy:
+ ezio.melotti
|
2012-04-01 07:10:29 | serhiy.storchaka | set | messages:
+ msg157260 |
2012-03-31 22:54:28 | pitrou | set | nosy:
+ pitrou messages:
+ msg157234
|
2012-03-17 02:05:24 | eric.smith | set | nosy:
+ eric.smith
|
2012-03-16 23:33:08 | pitrou | set | nosy:
+ mark.dickinson, skrah
components:
+ Interpreter Core, - Library (Lib) stage: patch review |
2012-03-16 21:09:30 | serhiy.storchaka | create | |