Issue4932
Created on 2009-01-13 17:32 by andrix, last changed 2010-07-14 11:57 by orsenthil. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
urlparse.py | andrix, 2009-01-13 17:32 | urlparse with changes. | ||
optmizedurlparse.py | andrix, 2009-01-14 16:54 | Optimized version of urlparse | ||
profile_urlparse.py | andrix, 2009-01-14 17:08 | Profile code | ||
urlparse.patch | andrix, 2009-01-14 17:26 | Patch to urlparse |
Messages (10) | |||
---|---|---|---|
msg79752 - (view) | Author: Andres Moreira (andrix) | Date: 2009-01-13 17:32 | |
Hi, I make a little change in the urlsplit function of urlparse module. And the performance when I'm parsing a lot of urls was improved a bit. In the best case was improved around a 20% percent. Python version: 2.5.2 (r252:60911, Oct 5 2008, 19:29:17) [GCC 4.3.2] Here are the benchmarks: #:~/tests$ python profile_urlparse.py timing urlparse.urlparse(): [0.28006601333618164, 0.27513313293457031, 0.20408511161804199] timing myurlparse.urlparse(): [0.11000704765319824, 0.10729002952575684, 0.10677695274353027] #:~/tests$ python profile_urlparse2.py timing urlparse.urlparse(): [0.28334403038024902, 0.27912592887878418, 0.15959692001342773] timing myurlparse.urlparse(): [0.11277103424072266, 0.11163187026977539, 0.11175107955932617] #:~/tests$ python profile_urlparse2.py timing urlparse.urlparse(): [0.28750920295715332, 0.2779538631439209, 0.27816200256347656] timing myurlparse.urlparse(): [0.25010085105895996, 0.11236691474914551, 0.11198592185974121] #-- Paste here the profiling code ----------- #-- Please rename the file as : profile_urlparse.py urls = [ "http://www.notonthehighstreet.com/boxwood/product/dotty_picture_frames", "http://www.fancylighting.com/acatalog/Petrushka_Bronze.html", "http://cgi.ebay.co.uk/3-LITRE-SUNNEX-STAINLESS-STEEL-TEAPOT-COFFEE-POT_W0QQitemZ160230173283QQcategoryZ122942QQcmdZViewItem", "http://retail.ictc.co.uk/acatalog/Online_Catalogue__Homegrown_184.html", "http://www.amazon.co.uk/Big-Mog-Tape-Judith-Kerr/dp/0001025252?SubscriptionId=0QE3E4F7T4Q5DCSKG202&tag=ws&linkCode=xm2&camp=2025&creative=165953&creativeASIN=0001025252", "http://www.dinerstore.co.uk/acatalog/copy_of_Bedford_Oak_Table_and_Six.html", "http://www.panik-design.co.uk/acatalog/Iittala_-_A_Citterio_-_Citterio_98_Cutlery_24pcs_.html", "http://www.johnlewis.com/230544027/Product.aspx", "http://cgi.ebay.co.uk/Damask-Black-Pink-Cream-Large-Modern-Rugs-120x170cm_W0QQitemZ400021540458QQcategoryZ57237QQcmdZViewItem", "http://www.amazon.co.uk/Nikon-50Mm-F1-2-Nikkor-Lens/dp/B00009R95Y?SubscriptionId=0QE3E4F7T4Q5DCSKG202&tag=ws&linkCode=xm2&camp=2025&creative=165953&creativeASIN=B00009R95Y", "http://www.amazon.co.uk/Storeys-Guide-Raising-Llamas-Birutta/dp/1580173284?SubscriptionId=0QE3E4F7T4Q5DCSKG202&tag=ws&linkCode=xm2&camp=2025&creative=165953&creativeASIN=1580173284", "http://www.24electric.com/detail.php?ProdID=46519663", "http://cgi.ebay.co.uk/Antique-Pine-Midi-Sleeper-Childrens-Bed-VGC_W0QQitemZ280285887713QQcategoryZ122763QQcmdZViewItem", "http://www.johnlewis.com/230421907/Product.aspx", "http://cgi.ebay.co.uk/WICKER-PLACE-DINNER-MATS-X6-IN-A-WICKER-BASKET-WITH-LID_W0QQitemZ350141392941QQcategoryZ20660QQcmdZViewItem", "http://www.trueshopping.co.uk/product/Draper_1_4_Square_Drive_Reversible_Ratchet/3495/43235.html", "http://www.amazon.co.uk/Transcend-TS128MIB6986-128MB-Module/dp/B000HCO61K?SubscriptionId=0QE3E4F7T4Q5DCSKG202&tag=ws&linkCode=xm2&camp=2025&creative=165953&creativeASIN=B000HCO61K", "http://www.unitedinteriors.co.uk/regency-pine-hi-fi-cabinet-4741-p.asp", "http://www.amazon.co.uk/BATTERY-CAMCORDER-DCR-DVD602-DCR-DVD602E-DCRDVD602/dp/B0017UM1OU?SubscriptionId=0QE3E4F7T4Q5DCSKG202&tag=ws&linkCode=xm2&camp=2025&creative=165953&creativeASIN=B0017UM1OU", "http://www.tooled-up.com/Product.asp?PID=145138", "http://www.dinerstore.co.uk/acatalog/copy_of_Bedford_Oak_Table_and_Six.html", "http://www.panik-design.co.uk/acatalog/Iittala_-_A_Citterio_-_Citterio_98_Cutlery_24pcs_.html", "http://www.johnlewis.com/230544027/Product.aspx", "http://cgi.ebay.co.uk/Damask-Black-Pink-Cream-Large-Modern-Rugs-120x170cm_W0QQitemZ400021540458QQcategoryZ57237QQcmdZViewItem", "http://www.amazon.co.uk/Nikon-50Mm-F1-2-Nikkor-Lens/dp/B00009R95Y?SubscriptionId=0QE3E4F7T4Q5DCSKG202&tag=ws&linkCode=xm2&camp=2025&creative=165953&creativeASIN=B00009R95Y", "http://www.amazon.co.uk/Storeys-Guide-Raising-Llamas-Birutta/dp/1580173284?SubscriptionId=0QE3E4F7T4Q5DCSKG202&tag=ws&linkCode=xm2&camp=2025&creative=165953&creativeASIN=1580173284", "http://www.24electric.com/detail.php?ProdID=46519663", "http://cgi.ebay.co.uk/Antique-Pine-Midi-Sleeper-Childrens-Bed-VGC_W0QQitemZ280285887713QQcategoryZ122763QQcmdZViewItem", "http://www.johnlewis.com/230421907/Product.aspx", "http://cgi.ebay.co.uk/WICKER-PLACE-DINNER-MATS-X6-IN-A-WICKER-BASKET-WITH-LID_W0QQitemZ350141392941QQcategoryZ20660QQcmdZViewItem", "http://www.trueshopping.co.uk/product/Draper_1_4_Square_Drive_Reversible_Ratchet/3495/43235.html", ] if __name__ == '__main__': import sys import timeit if len(sys.argv) > 1: times = int(sys.argv[1]) else: times = 1000 t = timeit.Timer("[urlparse.urlparse(u) for u in urls]", "from profile_urlparse import urls; import urlparse") print "timing urlparse.urlparse():" print " ", t.repeat(3, times) t = timeit.Timer("[myurlparse.urlparse(u) for u in urls]", "from profile_urlparse import urls; import myurlparse") print "timing myurlparse.urlparse():" print " ", t.repeat(3, times) #--- End of profile code ---------------------------------------- |
|||
msg79763 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2009-01-13 18:44 | |
It looks like most of your changes are already part of urlparse module of Python 2.6. Can you port your patch to Python 2.6 and retry your benchmark on Python 2.6? |
|||
msg79860 - (view) | Author: Andres Moreira (andrix) | Date: 2009-01-14 16:54 | |
Hi haypo, Ok I've been testing with python 2.6 and I put the results here: optimizedurlparse is the file with my patch First test optimizedurlparse, second urlparse: #:/opt/python2.6/release26-maint$ ./python mio/profile_urlparse.py timing optimizedurlparse.urlparse(): [0.89634895324707031, 0.61937308311462402, 0.62004208564758301] timing urlparse.urlparse(): [0.64083003997802734, 0.6862800121307373, 0.67195010185241699] #:/opt/python2.6/release26-maint$ ./python mio/profile_urlparse.py 2000 timing optimizedurlparse.urlparse(): [1.5077390670776367, 1.2391939163208008, 1.2390918731689453] timing urlparse.urlparse(): [1.2550511360168457, 1.2493829727172852, 1.2445049285888672] Now I'll change the order of execution, first urlparse , second optimizedurlparse: #:/opt/python2.6/release26-maint$ ./python mio/profile_urlparse.py 2000 timing urlparse.urlparse(): [1.6836080551147461, 1.3892900943756104, 1.3195438385009766] timing optimizedurlparse.urlparse(): [1.4834678173065186, 1.4077410697937012, 1.3824198246002197] [19647 refs] #:/opt/python2.6/release26-maint$ ./python mio/profile_urlparse.py 2000 timing urlparse.urlparse(): [1.4398901462554932, 1.3237769603729248, 1.3057329654693604] timing optimizedurlparse.urlparse(): [1.3134419918060303, 1.3127460479736328, 1.2928199768066406] [19647 refs] Python Version: 2.6.1+ (release26-maint:68606, Jan 14 2009, 08:48:41) The small changes optimize the urlparse.urlparse and urlsplit function a bit :D. |
|||
msg79861 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2009-01-14 17:03 | |
Please, attach patches instead of the full file. |
|||
msg79862 - (view) | Author: Andres Moreira (andrix) | Date: 2009-01-14 17:08 | |
And that is the profile code. |
|||
msg79863 - (view) | Author: Andres Moreira (andrix) | Date: 2009-01-14 17:26 | |
Hi haypo, sorry for submit all the file, it's my first time here and I'm not very used to this process yet. :) Now I attach the patch. |
|||
msg79864 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2009-01-14 17:44 | |
> Now I attach the patch. Your first patch for Python 2.5 was interesting, and looked close to the python 2.6 version. But your second patch (for 2.6) contains only micro-optimisations: - inline the one-line clear_cache() function - replace "scheme, url = ..." assignation (used for scheme different than http) by classic assignation (url = ...) Your benchmark numbers are difficult to read, but I can't see impressive results (I guess it's smaller than 10%, maybe 5%). |
|||
msg79867 - (view) | Author: Andres Moreira (andrix) | Date: 2009-01-14 17:57 | |
Yes are micro-optimizations, but when I parsed a lot of url(10.000 or more) that 10% or 8% or 5% is very well for me :). It was little contribution to that module, anyway, i think that there are more optimizatoins to do but I will try to do it then. |
|||
msg110106 - (view) | Author: Mark Lawrence (BreamoreBoy) * | Date: 2010-07-12 16:12 | |
Andres, do you wish to provide more patches or can we close this issue? |
|||
msg110270 - (view) | Author: Senthil Kumaran (orsenthil) * ![]() |
Date: 2010-07-14 11:57 | |
I reviewed the patch and we may not go with it. - There is recent improvements in parsing and the patch does not go well with it, especially the clear_cache removal. Also there is a mistake in the patch: - scheme, url = url[:i].lower(), url[i+1:] + url = url[i+1:] This can create problems for certain parsing logic as scheme is being used later in the code. Closing the bug is invalid. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2010-07-14 11:57:28 | orsenthil | set | status: open -> closed resolution: not a bug messages: + msg110270 stage: test needed -> resolved |
2010-07-12 16:12:34 | BreamoreBoy | set | nosy:
+ BreamoreBoy messages: + msg110106 |
2009-02-13 01:47:06 | ajaksu2 | set | nosy: + jjlee |
2009-02-12 18:40:02 | ajaksu2 | set | nosy:
+ orsenthil stage: test needed versions: + Python 2.7, - Python 2.5 |
2009-01-14 17:57:31 | andrix | set | messages: + msg79867 |
2009-01-14 17:44:37 | vstinner | set | messages: + msg79864 |
2009-01-14 17:26:05 | andrix | set | files:
+ urlparse.patch keywords: + patch messages: + msg79863 |
2009-01-14 17:08:40 | andrix | set | files:
+ profile_urlparse.py messages: + msg79862 |
2009-01-14 17:03:39 | vstinner | set | messages: + msg79861 |
2009-01-14 16:54:55 | andrix | set | files:
+ optmizedurlparse.py messages: + msg79860 |
2009-01-13 18:44:07 | vstinner | set | nosy:
+ vstinner messages: + msg79763 |
2009-01-13 17:32:35 | andrix | create |