msg163420 - (view) |
Author: Guido van Rossum (gvanrossum) * |
Date: 2012-06-22 14:59 |
This bug is still present in Python 3.3. Some people use urlsafe strings *a lot*. And they find that the (obvious) implementation in base64.py is way too slow. In fact, I found myself writing the following a few times in App Engine libraries:
# This is 3-4x faster than urlsafe_b64decode()
urlsafe = base64.b64encode(<arg>)
return urlsafe.rstrip('=').replace('+', '-').replace('/', '_')
Someone from YouTube told me they patch the stdlib for the same reason (they claim their solution is 10x faster).
IIUC the cause of the slowness is the (mostly irrelevant) generality in b64encode and the _translate() function.
|
msg163422 - (view) |
Author: Jesús Cea Avión (jcea) * |
Date: 2012-06-22 15:32 |
This is a feature request, not a bug. Marking for 3.3.
|
msg163424 - (view) |
Author: Jesús Cea Avión (jcea) * |
Date: 2012-06-22 15:42 |
Guido, would your patch be enough?. Is a 3x improvement enough?. What improvement could be acceptable?. Implemented in C in "binascii" module?
|
msg163430 - (view) |
Author: Guido van Rossum (gvanrossum) * |
Date: 2012-06-22 16:28 |
Why would it be a feature? There is no change in semantics, only in speed. Do we have a policy that forbids performance enhancements in bugfix releases? (It doesn't really matter, I'm just curious. Also, if it really was a feature request, it couldn't be introduced once the 3.3 beta has started, which would strike me as silly.)
Anyway, I don't think my hack should be applied literally; I was more thinking of memoizing the translation table and inlining the _translate call.
|
msg163432 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2012-06-22 16:38 |
> Do we have a policy that forbids performance enhancements in bugfix
> releases?
Yes, except when it fixes a regression. I don't know where that policy is codified, but it seems quite ingrained.
It also means the window for getting it in 3.3 is getting narrow, the beta being so soon :)
|
msg163435 - (view) |
Author: Guido van Rossum (gvanrossum) * |
Date: 2012-06-22 16:55 |
Here's a patch. It makes urlsafe_b64{en,de}code about 4x faster on my machine. The tests in base64_test.py still pass.
|
msg163439 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2012-06-22 17:11 |
$ ./python -m timeit -s "from base64 import urlsafe_b64encode; url=bytes(range(256))" "urlsafe_b64encode(url)"
10000 loops, best of 3: 73.5 usec per loop
$ ./python -m timeit -s "from base64 import b64encode; url=bytes(range(256))" "b64encode(url).replace(b'+', b'-').replace(b'/', b'_')"
100000 loops, best of 3: 18.9 usec per loop
$ ./python -m timeit -s "from base64 import b64encode; url=bytes(range(256)); tr=bytes.maketrans(b'+/', b'-_')" "b64encode(url).translate(tr)"
100000 loops, best of 3: 15.6 usec per loop
|
msg163440 - (view) |
Author: Guido van Rossum (gvanrossum) * |
Date: 2012-06-22 17:19 |
Heh. I'd forgotten about maketrans. Here's a patch that kills the _translate() helper function completely.
|
msg163441 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2012-06-22 17:26 |
You're faster than me.
I think that asserts should be replaced by if/raise.
|
msg163443 - (view) |
Author: Guido van Rossum (gvanrossum) * |
Date: 2012-06-22 17:28 |
I didn't touch that assert.
|
msg163475 - (view) |
Author: Jesús Cea Avión (jcea) * |
Date: 2012-06-22 20:07 |
Antoine thinks so: http://bugs.python.org/issue15139 :-)
I think we should have an official policy, though.
|
msg163494 - (view) |
Author: Guido van Rossum (gvanrossum) * |
Date: 2012-06-22 21:38 |
So, can I check this into 3.3?
|
msg163496 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2012-06-22 21:52 |
I haven't tested the patch, but it looks fine to me.
|
msg163499 - (view) |
Author: Guido van Rossum (gvanrossum) * |
Date: 2012-06-22 22:19 |
Submitted.
|
msg244485 - (view) |
Author: Devin Jeanpierre (Devin Jeanpierre) * |
Date: 2015-05-30 18:56 |
Here's a backport of the patch to 2.7. It's pretty rad, and basically identical to how YouTube monkeypatches base64.
Not sure what will happen to this patch. According to recent discussion on the list (e.g. https://mail.python.org/pipermail/python-dev/2015-May/140380.html ), performance improvements are open for inclusion in 2.7 if anyone wants to bother with merging this in and taking on the review / maintenance burden.
I'm OK with just publishing it for others to merge in with their own private versions of Python. It is only relevant if you use base64 a lot. :)
|
msg247672 - (view) |
Author: Guido van Rossum (gvanrossum) * |
Date: 2015-07-30 14:26 |
I'm going to apply the 2.7 patch, except I'm going to leave the (now unused) _translate() function in place, in case it's used anywhere (minimizing the risk of breaking anything).
|
msg247675 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2015-07-30 14:50 |
New changeset 309cbbc32491 by Guido van Rossum in branch '2.7':
Issue #15138: Speed up base64.urlsafe_b64* considerably (2.7 backport).
https://hg.python.org/cpython/rev/309cbbc32491
|
msg247677 - (view) |
Author: Guido van Rossum (gvanrossum) * |
Date: 2015-07-30 14:58 |
And just in case someone objects to my breaking the policy stated by Antoine in msg163432: see http://bugs.python.org/issue4753 -- it's not actually a real policy.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:57:31 | admin | set | github: 59343 |
2015-07-30 14:58:04 | gvanrossum | set | messages:
+ msg247677 versions:
+ Python 2.7 |
2015-07-30 14:50:45 | python-dev | set | nosy:
+ python-dev messages:
+ msg247675
|
2015-07-30 14:26:56 | gvanrossum | set | messages:
+ msg247672 |
2015-05-30 18:56:12 | Devin Jeanpierre | set | files:
+ base64_27.diff nosy:
+ Devin Jeanpierre messages:
+ msg244485
|
2012-06-22 22:19:47 | gvanrossum | set | status: open -> closed resolution: fixed messages:
+ msg163499
|
2012-06-22 21:52:23 | pitrou | set | messages:
+ msg163496 versions:
- Python 3.4 |
2012-06-22 21:39:43 | gvanrossum | set | assignee: gvanrossum stage: needs patch -> patch review |
2012-06-22 21:38:38 | gvanrossum | set | messages:
+ msg163494 |
2012-06-22 20:07:26 | jcea | set | messages:
+ msg163475 |
2012-06-22 17:28:38 | gvanrossum | set | messages:
+ msg163443 |
2012-06-22 17:26:01 | serhiy.storchaka | set | messages:
+ msg163441 |
2012-06-22 17:19:43 | gvanrossum | set | files:
+ base64.patch
messages:
+ msg163440 |
2012-06-22 17:11:05 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka messages:
+ msg163439
|
2012-06-22 16:55:23 | gvanrossum | set | files:
+ base64.patch keywords:
+ patch messages:
+ msg163435
|
2012-06-22 16:38:01 | pitrou | set | nosy:
+ pitrou messages:
+ msg163432
|
2012-06-22 16:28:25 | gvanrossum | set | messages:
+ msg163430 |
2012-06-22 15:42:01 | jcea | set | messages:
+ msg163424 |
2012-06-22 15:32:06 | jcea | set | nosy:
+ jcea
messages:
+ msg163422 versions:
- Python 2.6, Python 3.1, Python 2.7, Python 3.2 |
2012-06-22 14:59:59 | gvanrossum | create | |