File name |
Uploaded |
Description |
Edit |
urllib_faster_quote.patch
|
tseaver,
2005-09-08 16:38
|
Patch implementing faster quote |
|
urllib_fast_quote_speed_test.py
|
tseaver,
2005-09-08 16:38
|
|
|
urllib_faster_quote-py2.4.patch
|
tseaver,
2005-09-09 02:30
|
Patch against Python 2.4's version |
|
urllib.quote.patch
|
ajaksu2,
2008-04-08 18:41
|
Updated patch: cache regexps on (safe, always safe) |
|
urllib_better_speed_test.py
|
tseaver,
2010-04-29 22:24
|
Improved speed test script. |
|
issue1285086-updated.patch
|
tseaver,
2010-04-30 14:48
|
Updated patch against trunk |
|
quotebench.py
|
flox,
2010-05-16 14:25
|
Benchmark script for quote and unquote variants |
|
issue1285086_using_rstrip_py3k.diff
|
flox,
2010-05-16 14:27
|
Patch, apply to 3.x |
|
urllib_faster_unquote.patch
|
serhiy.storchaka,
2013-02-13 21:04
|
|
review
|
urllib_faster_unquote-2.7.patch
|
serhiy.storchaka,
2013-02-13 21:05
|
|
review
|
msg54612 - (view) |
Author: Tres Seaver (tseaver) * |
Date: 2005-09-08 16:37 |
'urllib.quote' delegates to '_fast_quote' for the common
case that the user has passed no 'safe' argument. However,
'_fast_quote' isn't really very fast, especially for
the case that
it doesn't need to quote anything.
Zope (and presumably other web frameworks) can end up
calling 'quote' dozens, hundreds, even thousands of times
to render a page, which makes this a potentially big win
for them.
I will attach a speed test script which demonstrates the
speed penalty, along with a patch which implements the
speedup.
|
msg54613 - (view) |
Author: Jeff Epler (jepler) |
Date: 2005-09-09 01:01 |
Logged In: YES
user_id=2772
Tested on Python 2.4.0. The patch fails on the first chunk
because the list of imports don't match.
The urllib_fast_quote_speed_test.py doesn't run once urllib
has been patched.
I reverted the patch to urllib.py and re-ran. I got
"faster" values from 0.758 to 0.964.
|
msg54614 - (view) |
Author: Tres Seaver (tseaver) * |
Date: 2005-09-09 02:30 |
Logged In: YES
user_id=127625
I'm attaching a patch against 2.4's version
|
msg54615 - (view) |
Author: Tres Seaver (tseaver) * |
Date: 2005-09-09 02:35 |
Logged In: YES
user_id=127625
Note that the speed test script shows equivalent speedups for
both 2.3 and 2.4, ranging from 90% (for the empty string) down
to 73% (for a string with a single character). The more
"normal"
cases range from 82% to 89% speedups.
|
msg54616 - (view) |
Author: Raymond Hettinger (rhettinger) *  |
Date: 2005-09-10 03:45 |
Logged In: YES
user_id=80475
Checked in a speed-up for Py2.5.
See Lib/urllib.py 1.169.
The check-in provides fast-quoting for all cases (not just
for the default safe argument). Even the fast path is
quicker. With translation for both safe and unsafe
characters, it saves len(s) trips through the eval loop,
computes of non-safe replacements just once, and eliminates
the if-logic. The new table is collision free and has no
failed lookups, so each lookup requires exactly one probe.
One my machine, timings improved by a factor of two to three
depending on the length of input and number of escaped
characters.
The check-in also simplifies and speeds-up quote_plus() by
using str.replace() instead of a split
Leaving this SF report open because the OP's idea may
possibly provide further improvement -- the checkin itself
was done because it is a clear win over the existing version.
The OP's patch uses regexps to short-circuit when no changes
are needed. Unless the regexp is cheap and short-circuits
often, the cost of testing will likely exceed the average
amount saved.
Determining whether the regexp is cheaper than the
checked-in version just requires a few timings. But,
determining the short-circuit percentage requires collecting
statistics from real programs with real data. For the idea
to be a winner, regexps have to be much faster than the
map/lookup/join step AND the short-circuit case must occur
frequently.
Am lowering the priority until a better patch is received
along with timings and statistical evidence demonstrating a
significant improvement. Also, reclassifying as a Feature
Request because the existing code is functioning as
documented and passing tests.
|
msg65203 - (view) |
Author: Daniel Diniz (ajaksu2) *  |
Date: 2008-04-08 18:41 |
This is what I found doing some timings:
For the short-circuit path, the regexp can make quote 10x as fast in
exceptional cases, even comparing to the faster version in trunk. The
average win for short-circuit seems to be twice as fast as the map in my
timings. This sounds good.
For the normal path, the overhead can make quote 50% slower. This IMHO
makes it unfit for quote replacement. Perhaps good for a cookbook recipe?
Regarding the OP's use case, I believe either adding a string cache to
quote or flagging stored strings as "safe" or "must quote" would result
in a much greater impact on performance.
Attaching patch against trunk. Web framework developers should be
interested in testing this and could provide the use cases/data needed
for settling this issue.
|
msg81805 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2009-02-12 19:24 |
If someone needs a faster quote(), it's probably easy to write an
insanely fast C version...
|
msg104594 - (view) |
Author: Tres Seaver (tseaver) * |
Date: 2010-04-29 22:24 |
I'm uploading a saner version of the speed test which uses timeit's support for passing a callable as a statement: it is much easier to
see what the test is actually doing.
On my machine, running against the release26-maint branch, my version runs anywhere from 67% to 85% faster on the cases where no quoting actually needs to be done. In production use, where urlib.quote is used as a safety check, these cases far outweigh the cases which actually require quoting.
|
msg104636 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2010-04-30 14:10 |
Tres, can you update your patch against SVN trunk?
Also, please do an unified diff, we are mostly used to this format.
|
msg104639 - (view) |
Author: Tres Seaver (tseaver) * |
Date: 2010-04-30 14:48 |
Updated patch against the trunk attached.
Note that I prefer unidiff myself, but was following a bit of guidance that Guido (used to, anyway) prefer context diffs.
The updated speed test run against the stdlib with this patch shows only a 12% to 16% hit against my simpler version (which didn't handle cases where the 'safe' argument was not the default "/").
|
msg105156 - (view) |
Author: Senthil Kumaran (orsenthil) *  |
Date: 2010-05-06 18:53 |
I reviewed the patch and the speed test attached.
Well, yes, the patch does achieve a certain level of speed improvement but I also saw that in cases when the quoting is really required (special characters, the current stdlib is faster).
The speed improvement proposed is really not a convincing enough reason to replace the code of the current quote. rhettinger's comment in this issue might taken as reference to work out if there are any statistical evidence (:)?) that if the frameworks use it, would they benefit.
Few more points.
- quote is a generic function, checking for empty string, returning it and marking it speed improvement, really is not a improvement (practically the person using the quote can avoid it too and thereby increase the speed).
- Some framework might try this , like django or zope for e.g and see if its really a worthy case in using re to check for chars to be quoted and quote only those (and see if there is any improvement).
- Might provide this as an activestate recipe and see if there any takers (again for reasons of use-case)
It is correctly a low priority one, or it can just be rejected too.
BTW, there is a feature request on quote support unicode char with patch, when that is in place, I think this request may suffer further.
|
msg105448 - (view) |
Author: Tres Seaver (tseaver) * |
Date: 2010-05-10 17:05 |
I can only reiterate that Zope apps can call 'urllib.quote' dozens,
hundreds, even thousands of times on a single request: the
reason for the original bug report was that 'urllib.quote' was
showing up frequently on profiling output for such requests.
Short-circuiting the case for the empty string and the case that
the string being quoted actually contains only safe characters
together made 'urllib.quote' disappear from the profiler output.
Newer frameworks, such as 'repoze.bfg', avoid using
'urliib.quote' at all for just this reason.
|
msg105486 - (view) |
Author: Florent Xicluna (flox) *  |
Date: 2010-05-11 07:44 |
I've tested a variant of the previous patch.
On my laptop, it gives good performance for simple cases, and the penalty for real-quoting case is very low.
I've tested a short-circuit for the unquote() function too.
|
msg105487 - (view) |
Author: Senthil Kumaran (orsenthil) *  |
Date: 2010-05-11 08:37 |
Lets also see how this fares in py3k (where quote function takes an encoding ) and possibly push it in.
If there is any hesitation we can consult python-dev or wsgi groups where frameworks developers might review and voice concerns, if they have any.
|
msg105513 - (view) |
Author: Florent Xicluna (flox) *  |
Date: 2010-05-11 13:44 |
New patch, using str.translate instead of regexp.
It is faster for normal cases (85% less time than stdlib quote), and the penalty for the real-quoting case is less than 5%.
It should apply to 3.x with some adaptation.
|
msg105516 - (view) |
Author: Florent Xicluna (flox) *  |
Date: 2010-05-11 14:02 |
The speed test script did not work on 2.5 (because timeit.Timer does not accept a callable).
Fixed version, which benchmarks the str.translate(...) version.
Change the '_new_quote_setup' assignment to test other variants.
|
msg105523 - (view) |
Author: Florent Xicluna (flox) *  |
Date: 2010-05-11 15:44 |
actually, there's a simpler implementation, using s.rstrip(always_safe + safe).
It is as fast as the previous one.
|
msg105868 - (view) |
Author: Florent Xicluna (flox) *  |
Date: 2010-05-16 14:25 |
Updated script for benchmarks (on 2.x and 3.x).
Inspired by the "Tools/iobench" script.
It benchmarks various quote/unquote implementations on 2.x and 3.x.
On 2.7 the fastest implementation is something like:
def quote(s):
if not s or not s.rstrip(safe):
return s
return ''.join(map(safe_get, s))
On 3.2 the fastest implementation uses list comprehension:
def quote_from_bytes(s):
if not s:
return ''
if not s.rstrip(safe):
return s.decode()
return ''.join([quoter(c) for c in s])
Note: the regexp implementation is slower in both cases.
|
msg105869 - (view) |
Author: Florent Xicluna (flox) *  |
Date: 2010-05-16 14:27 |
Proposed patches for quote and unquote on 2.7 and 3.2.
|
msg105903 - (view) |
Author: Florent Xicluna (flox) *  |
Date: 2010-05-17 13:38 |
Committed in r81265 for 2.7.
|
msg105919 - (view) |
Author: Senthil Kumaran (orsenthil) *  |
Date: 2010-05-17 17:37 |
On Mon, May 17, 2010 at 01:38:57PM +0000, Florent Xicluna wrote:
> Committed in r81265 for 2.7.
Thanks. That was interesting. Without resorting to any drastic changes like
use of regex, interesting speed-up seems to have been achieved by
using rsplit and map.
In py3k, I think its 'okay' to go for rsplit, defaultdict and map
which will be similar to py2k version. I saw that map vs list
comprehension did not have much differences.
|
msg105921 - (view) |
Author: Florent Xicluna (flox) *  |
Date: 2010-05-17 17:46 |
Committed to 3.2 in r81271, after some additional tuning.
Btw, I kept list comprehension in 3.2, because it is faster for small strings (even if the gain is ~10%).
|
msg113708 - (view) |
Author: Barry A. Warsaw (barry) *  |
Date: 2010-08-12 21:35 |
Why was this merged to 2.6 after 2.6.6rc1 without approval?
|
msg113714 - (view) |
Author: Barry A. Warsaw (barry) *  |
Date: 2010-08-12 22:45 |
flox reverted in r83967 for 2.6.6.
|
msg182056 - (view) |
Author: Maciej Fijalkowski (fijall) *  |
Date: 2013-02-13 18:41 |
As per discussion on python-dev, this bug should probably be reopened and the patch maybe reverted as relying on the refcounting hack is both dodgy and hurts other implementations, like PyPy.
|
msg182061 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2013-02-13 21:04 |
Proposed patches not only get rid of the refcount hack, but make unquote() and unquote_to_bytes() even significant faster for large strings.
|
msg182079 - (view) |
Author: Senthil Kumaran (orsenthil) *  |
Date: 2013-02-14 06:51 |
Applying this patch - I tried this hypothetical test.
$ ./python.exe -m timeit -s "import urllib.parse; x='a%20' * 100000"
"urllib.parse.unquote(x)"
10 loops, best of 3: 205 msec per loop
Without the patch, the above test will run for minutes!
This creates a significant difference for extremely long query
strings. For small strings the performance is negligible.
Serhiy: One question. Is there a need to have -
+ append = res.append
And then use 'append'? res.append is easier for readability IMO.
Besides that I dont have any other comments. Maciej's comment could be
sought as he (correctly) brought up this issue.
|
msg182080 - (view) |
Author: Senthil Kumaran (orsenthil) *  |
Date: 2013-02-14 07:00 |
I wrongly minutes. Here is actual data of execution speeds. There is
magnitude of difference (almost 130x faster). Measured on macbook pro
with 2 cores and 4 Gig mem.
Before Patch:
$ ./python.exe -m timeit -s "import urllib.parse; x='a%20' * 100000"
"urllib.parse.unquote(x)"
10 loops, best of 3: 26.8 sec per loop
After Patch:
$ ./python.exe -m timeit -s "import urllib.parse; x='a%20' * 100000"
"urllib.parse.unquote(x)"
10 loops, best of 3: 205 msec per loop
|
msg182091 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2013-02-14 11:59 |
> Serhiy: One question. Is there a need to have -
>
> + append = res.append
>
> And then use 'append'?
This speed up unquote_to_bytes by 15%.
|
msg182206 - (view) |
Author: Senthil Kumaran (orsenthil) *  |
Date: 2013-02-16 01:54 |
> Serhiy Storchaka added the comment:
>>
>> + append = res.append
>>
>> And then use 'append'?
>
> This speed up unquote_to_bytes by 15%.
Thanks for the response. In fact, writing the whole _hextobyte
verbatim would further increase the speed, but that would be huge
dis-advantage in terms of space occupied by that dict in the code. I
am +1 with the patch, please go ahead with committing it.
|
msg184113 - (view) |
Author: Senthil Kumaran (orsenthil) *  |
Date: 2013-03-13 21:09 |
Serhiy - Is there any technical issue that is holding up this patch? (I dont see any). If nothing is holding up and you are busy, I shall go ahead with committing this one. /cc flox
|
msg184182 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2013-03-14 19:43 |
New changeset 4927899bea8d by Serhiy Storchaka in branch '2.7':
Issue #1285086: Get rid of the refcounting hack and speed up urllib.unquote().
http://hg.python.org/cpython/rev/4927899bea8d
New changeset 3cb07925fcb9 by Serhiy Storchaka in branch '3.2':
Issue #1285086: Get rid of the refcounting hack and speed up
http://hg.python.org/cpython/rev/3cb07925fcb9
New changeset 209a9c2de9bd by Serhiy Storchaka in branch '3.3':
Issue #1285086: Get rid of the refcounting hack and speed up
http://hg.python.org/cpython/rev/209a9c2de9bd
New changeset 9367411a261e by Serhiy Storchaka in branch 'default':
Issue #1285086: Get rid of the refcounting hack and speed up
http://hg.python.org/cpython/rev/9367411a261e
|
msg184183 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2013-03-14 19:46 |
Sorry, I perhaps missed your response, Senthil. Now committed and closed again.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:56:12 | admin | set | github: 42356 |
2013-03-14 19:46:47 | serhiy.storchaka | set | status: open -> closed resolution: fixed messages:
+ msg184183
stage: patch review -> resolved |
2013-03-14 19:43:28 | python-dev | set | nosy:
+ python-dev messages:
+ msg184182
|
2013-03-13 21:09:05 | orsenthil | set | messages:
+ msg184113 |
2013-02-16 03:03:03 | ezio.melotti | set | nosy:
+ ezio.melotti
|
2013-02-16 01:54:49 | orsenthil | set | messages:
+ msg182206 |
2013-02-14 11:59:35 | serhiy.storchaka | set | messages:
+ msg182091 |
2013-02-14 07:00:12 | orsenthil | set | messages:
+ msg182080 |
2013-02-14 06:51:40 | orsenthil | set | messages:
+ msg182079 |
2013-02-13 21:05:20 | serhiy.storchaka | set | files:
+ urllib_faster_unquote-2.7.patch stage: needs patch -> patch review versions:
+ Python 3.3, Python 3.4 |
2013-02-13 21:04:37 | serhiy.storchaka | set | files:
+ urllib_faster_unquote.patch nosy:
+ serhiy.storchaka messages:
+ msg182061
|
2013-02-13 18:48:08 | serhiy.storchaka | set | status: closed -> open resolution: fixed -> (no value) stage: resolved -> needs patch |
2013-02-13 18:41:20 | fijall | set | nosy:
+ fijall messages:
+ msg182056
|
2010-08-12 22:45:39 | barry | set | status: open -> closed priority: release blocker -> messages:
+ msg113714
|
2010-08-12 21:42:53 | barry | set | status: closed -> open priority: low -> release blocker |
2010-08-12 21:35:53 | barry | set | nosy:
+ barry messages:
+ msg113708
|
2010-05-17 17:46:03 | flox | set | status: open -> closed
messages:
+ msg105921 |
2010-05-17 17:40:17 | orsenthil | set | assignee: orsenthil -> flox resolution: fixed stage: patch review -> resolved |
2010-05-17 17:37:30 | orsenthil | set | messages:
+ msg105919 |
2010-05-17 17:00:49 | flox | set | files:
- issue1285086_using_rstrip_v2.diff |
2010-05-17 13:38:55 | flox | set | messages:
+ msg105903 |
2010-05-16 14:27:35 | flox | set | files:
+ issue1285086_using_rstrip_py3k.diff
messages:
+ msg105869 |
2010-05-16 14:27:03 | flox | set | files:
+ issue1285086_using_rstrip_v2.diff |
2010-05-16 14:25:05 | flox | set | files:
+ quotebench.py
messages:
+ msg105868 |
2010-05-16 14:13:36 | flox | set | files:
- urllib_quote_speed_test.py |
2010-05-16 14:13:20 | flox | set | files:
- issue1285086_using_rstrip.diff |
2010-05-16 14:13:16 | flox | set | files:
- issue1285086_using_translate.diff |
2010-05-16 14:13:06 | flox | set | files:
- issue1285086_fast_quote.diff |
2010-05-11 15:48:24 | flox | set | files:
+ urllib_quote_speed_test.py |
2010-05-11 15:44:09 | flox | set | files:
+ issue1285086_using_rstrip.diff
messages:
+ msg105523 |
2010-05-11 15:43:43 | flox | set | files:
- urllib_quote_speed_test.py |
2010-05-11 14:02:35 | flox | set | files:
+ urllib_quote_speed_test.py
messages:
+ msg105516 |
2010-05-11 13:44:33 | flox | set | files:
+ issue1285086_using_translate.diff
messages:
+ msg105513 |
2010-05-11 08:37:54 | orsenthil | set | assignee: orsenthil messages:
+ msg105487 |
2010-05-11 07:54:28 | flox | set | files:
+ issue1285086_fast_quote.diff |
2010-05-11 07:54:05 | flox | set | files:
- issue1285086_fast_quote.diff |
2010-05-11 07:44:04 | flox | set | files:
+ issue1285086_fast_quote.diff nosy:
+ flox messages:
+ msg105486
|
2010-05-10 17:05:10 | tseaver | set | messages:
+ msg105448 |
2010-05-06 18:53:01 | orsenthil | set | messages:
+ msg105156 |
2010-04-30 14:48:19 | tseaver | set | files:
+ issue1285086-updated.patch
messages:
+ msg104639 |
2010-04-30 14:10:55 | pitrou | set | stage: test needed -> patch review messages:
+ msg104636 versions:
+ Python 2.7, Python 3.2 |
2010-04-29 22:24:40 | tseaver | set | files:
+ urllib_better_speed_test.py
messages:
+ msg104594 |
2009-02-12 19:24:57 | pitrou | set | nosy:
+ pitrou messages:
+ msg81805 |
2009-02-12 18:18:56 | ajaksu2 | set | nosy:
+ orsenthil stage: test needed |
2008-04-08 18:41:24 | ajaksu2 | set | files:
+ urllib.quote.patch nosy:
+ ajaksu2 messages:
+ msg65203 keywords:
+ patch |
2008-03-16 21:06:12 | georg.brandl | set | type: enhancement -> performance |
2005-09-08 16:37:34 | tseaver | create | |