This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Little improvement on urlparse module, urlparse function.
Type: performance Stage: resolved
Components: Library (Lib) Versions: Python 2.7
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: BreamoreBoy, andrix, jjlee, orsenthil, vstinner
Priority: normal Keywords: patch

Created on 2009-01-13 17:32 by andrix, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
urlparse.py andrix, 2009-01-13 17:32 urlparse with changes.
optmizedurlparse.py andrix, 2009-01-14 16:54 Optimized version of urlparse
profile_urlparse.py andrix, 2009-01-14 17:08 Profile code
urlparse.patch andrix, 2009-01-14 17:26 Patch to urlparse
Messages (10)
msg79752 - (view) Author: Andres Moreira (andrix) Date: 2009-01-13 17:32
Hi, 
 I make a little change in the urlsplit function of urlparse module. And
the performance when I'm parsing a lot of urls was improved a bit.
In the best case was improved around a 20% percent.

Python version:
2.5.2 (r252:60911, Oct  5 2008, 19:29:17) 
[GCC 4.3.2]

Here are the benchmarks:
#:~/tests$ python profile_urlparse.py 
timing urlparse.urlparse():
    [0.28006601333618164, 0.27513313293457031, 0.20408511161804199]
timing myurlparse.urlparse():
    [0.11000704765319824, 0.10729002952575684, 0.10677695274353027]
#:~/tests$ python profile_urlparse2.py 
timing urlparse.urlparse():
    [0.28334403038024902, 0.27912592887878418, 0.15959692001342773]
timing myurlparse.urlparse():
    [0.11277103424072266, 0.11163187026977539, 0.11175107955932617]
#:~/tests$ python profile_urlparse2.py 
timing urlparse.urlparse():
    [0.28750920295715332, 0.2779538631439209, 0.27816200256347656]
timing myurlparse.urlparse():
    [0.25010085105895996, 0.11236691474914551, 0.11198592185974121]


#-- Paste here the profiling code -----------
#-- Please rename the file as : profile_urlparse.py

urls = [
   
"http://www.notonthehighstreet.com/boxwood/product/dotty_picture_frames",
    "http://www.fancylighting.com/acatalog/Petrushka_Bronze.html",
   
"http://cgi.ebay.co.uk/3-LITRE-SUNNEX-STAINLESS-STEEL-TEAPOT-COFFEE-POT_W0QQitemZ160230173283QQcategoryZ122942QQcmdZViewItem",
   
"http://retail.ictc.co.uk/acatalog/Online_Catalogue__Homegrown_184.html",
   
"http://www.amazon.co.uk/Big-Mog-Tape-Judith-Kerr/dp/0001025252?SubscriptionId=0QE3E4F7T4Q5DCSKG202&tag=ws&linkCode=xm2&camp=2025&creative=165953&creativeASIN=0001025252",
   
"http://www.dinerstore.co.uk/acatalog/copy_of_Bedford_Oak_Table_and_Six.html",
   
"http://www.panik-design.co.uk/acatalog/Iittala_-_A_Citterio_-_Citterio_98_Cutlery_24pcs_.html",
    "http://www.johnlewis.com/230544027/Product.aspx",
   
"http://cgi.ebay.co.uk/Damask-Black-Pink-Cream-Large-Modern-Rugs-120x170cm_W0QQitemZ400021540458QQcategoryZ57237QQcmdZViewItem",
   
"http://www.amazon.co.uk/Nikon-50Mm-F1-2-Nikkor-Lens/dp/B00009R95Y?SubscriptionId=0QE3E4F7T4Q5DCSKG202&tag=ws&linkCode=xm2&camp=2025&creative=165953&creativeASIN=B00009R95Y",
   
"http://www.amazon.co.uk/Storeys-Guide-Raising-Llamas-Birutta/dp/1580173284?SubscriptionId=0QE3E4F7T4Q5DCSKG202&tag=ws&linkCode=xm2&camp=2025&creative=165953&creativeASIN=1580173284",
    "http://www.24electric.com/detail.php?ProdID=46519663",
   
"http://cgi.ebay.co.uk/Antique-Pine-Midi-Sleeper-Childrens-Bed-VGC_W0QQitemZ280285887713QQcategoryZ122763QQcmdZViewItem",
    "http://www.johnlewis.com/230421907/Product.aspx",
   
"http://cgi.ebay.co.uk/WICKER-PLACE-DINNER-MATS-X6-IN-A-WICKER-BASKET-WITH-LID_W0QQitemZ350141392941QQcategoryZ20660QQcmdZViewItem",
   
"http://www.trueshopping.co.uk/product/Draper_1_4_Square_Drive_Reversible_Ratchet/3495/43235.html",
   
"http://www.amazon.co.uk/Transcend-TS128MIB6986-128MB-Module/dp/B000HCO61K?SubscriptionId=0QE3E4F7T4Q5DCSKG202&tag=ws&linkCode=xm2&camp=2025&creative=165953&creativeASIN=B000HCO61K",
   
"http://www.unitedinteriors.co.uk/regency-pine-hi-fi-cabinet-4741-p.asp",
   
"http://www.amazon.co.uk/BATTERY-CAMCORDER-DCR-DVD602-DCR-DVD602E-DCRDVD602/dp/B0017UM1OU?SubscriptionId=0QE3E4F7T4Q5DCSKG202&tag=ws&linkCode=xm2&camp=2025&creative=165953&creativeASIN=B0017UM1OU",
    "http://www.tooled-up.com/Product.asp?PID=145138",
   
"http://www.dinerstore.co.uk/acatalog/copy_of_Bedford_Oak_Table_and_Six.html",
   
"http://www.panik-design.co.uk/acatalog/Iittala_-_A_Citterio_-_Citterio_98_Cutlery_24pcs_.html",
    "http://www.johnlewis.com/230544027/Product.aspx",
   
"http://cgi.ebay.co.uk/Damask-Black-Pink-Cream-Large-Modern-Rugs-120x170cm_W0QQitemZ400021540458QQcategoryZ57237QQcmdZViewItem",
   
"http://www.amazon.co.uk/Nikon-50Mm-F1-2-Nikkor-Lens/dp/B00009R95Y?SubscriptionId=0QE3E4F7T4Q5DCSKG202&tag=ws&linkCode=xm2&camp=2025&creative=165953&creativeASIN=B00009R95Y",
   
"http://www.amazon.co.uk/Storeys-Guide-Raising-Llamas-Birutta/dp/1580173284?SubscriptionId=0QE3E4F7T4Q5DCSKG202&tag=ws&linkCode=xm2&camp=2025&creative=165953&creativeASIN=1580173284",
    "http://www.24electric.com/detail.php?ProdID=46519663",
   
"http://cgi.ebay.co.uk/Antique-Pine-Midi-Sleeper-Childrens-Bed-VGC_W0QQitemZ280285887713QQcategoryZ122763QQcmdZViewItem",
    "http://www.johnlewis.com/230421907/Product.aspx",
   
"http://cgi.ebay.co.uk/WICKER-PLACE-DINNER-MATS-X6-IN-A-WICKER-BASKET-WITH-LID_W0QQitemZ350141392941QQcategoryZ20660QQcmdZViewItem",
   
"http://www.trueshopping.co.uk/product/Draper_1_4_Square_Drive_Reversible_Ratchet/3495/43235.html",
]

if __name__ == '__main__':
    import sys
    import timeit

    if len(sys.argv) > 1:
        times = int(sys.argv[1])
    else:
        times = 1000

    t = timeit.Timer("[urlparse.urlparse(u) for u in urls]", 
                     "from profile_urlparse import urls; import urlparse")
    print "timing urlparse.urlparse():"
    print "   ", t.repeat(3, times)


    t = timeit.Timer("[myurlparse.urlparse(u) for u in urls]", 
                     "from profile_urlparse import urls; import myurlparse")
    print "timing myurlparse.urlparse():"
    print "   ", t.repeat(3, times)

#--- End of profile code ----------------------------------------
msg79763 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2009-01-13 18:44
It looks like most of your changes are already part of urlparse module 
of Python 2.6. Can you port your patch to Python 2.6 and retry your 
benchmark on Python 2.6?
msg79860 - (view) Author: Andres Moreira (andrix) Date: 2009-01-14 16:54
Hi haypo,
 Ok I've been testing with python 2.6 and I put the results here:
 optimizedurlparse is the file with my patch

 First test optimizedurlparse, second urlparse:

#:/opt/python2.6/release26-maint$ ./python mio/profile_urlparse.py 
timing optimizedurlparse.urlparse():
    [0.89634895324707031, 0.61937308311462402, 0.62004208564758301]
timing urlparse.urlparse():
    [0.64083003997802734, 0.6862800121307373, 0.67195010185241699]

#:/opt/python2.6/release26-maint$ ./python mio/profile_urlparse.py 2000
timing optimizedurlparse.urlparse():
    [1.5077390670776367, 1.2391939163208008, 1.2390918731689453]
timing urlparse.urlparse():
    [1.2550511360168457, 1.2493829727172852, 1.2445049285888672]

Now I'll change the order of execution, first urlparse , second
optimizedurlparse:

#:/opt/python2.6/release26-maint$ ./python mio/profile_urlparse.py 2000
timing urlparse.urlparse():
    [1.6836080551147461, 1.3892900943756104, 1.3195438385009766]
timing optimizedurlparse.urlparse():
    [1.4834678173065186, 1.4077410697937012, 1.3824198246002197]
[19647 refs]
#:/opt/python2.6/release26-maint$ ./python mio/profile_urlparse.py 2000
timing urlparse.urlparse():
    [1.4398901462554932, 1.3237769603729248, 1.3057329654693604]
timing optimizedurlparse.urlparse():
    [1.3134419918060303, 1.3127460479736328, 1.2928199768066406]
[19647 refs]

Python Version: 
2.6.1+ (release26-maint:68606, Jan 14 2009, 08:48:41)

The small changes optimize the urlparse.urlparse and urlsplit function a
bit :D.
msg79861 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2009-01-14 17:03
Please, attach patches instead of the full file.
msg79862 - (view) Author: Andres Moreira (andrix) Date: 2009-01-14 17:08
And that is the profile code.
msg79863 - (view) Author: Andres Moreira (andrix) Date: 2009-01-14 17:26
Hi haypo, 
 sorry for submit all the file, it's my first time here and I'm not very
used to this process yet. :)
Now I attach the patch.
msg79864 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2009-01-14 17:44
> Now I attach the patch.

Your first patch for Python 2.5 was interesting, and looked close to the 
python 2.6 version. But your second patch (for 2.6) contains only 
micro-optimisations:
 - inline the one-line clear_cache() function
 - replace "scheme, url = ..." assignation (used for scheme different 
   than http) by classic assignation (url = ...)

Your benchmark numbers are difficult to read, but I can't see impressive 
results (I guess it's smaller than 10%, maybe 5%).
msg79867 - (view) Author: Andres Moreira (andrix) Date: 2009-01-14 17:57
Yes are micro-optimizations, but when I parsed a lot of url(10.000 or
more) that 10% or 8% or 5% is very well for me :). It was little
contribution to that module, anyway, i think that there are more
optimizatoins to do but I will try to do it then.
msg110106 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2010-07-12 16:12
Andres, do you wish to provide more patches or can we close this issue?
msg110270 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2010-07-14 11:57
I reviewed the patch and we may not go with it.
- There is recent improvements in parsing and the patch does not go well with it, especially the clear_cache removal.

Also there is a mistake in the patch:

-            scheme, url = url[:i].lower(), url[i+1:]
+            url = url[i+1:]

This can create problems for certain parsing logic as scheme is being used later in the code.

Closing the bug is invalid.
History
Date User Action Args
2022-04-11 14:56:44adminsetgithub: 49182
2010-07-14 11:57:28orsenthilsetstatus: open -> closed
resolution: not a bug
messages: + msg110270

stage: test needed -> resolved
2010-07-12 16:12:34BreamoreBoysetnosy: + BreamoreBoy
messages: + msg110106
2009-02-13 01:47:06ajaksu2setnosy: + jjlee
2009-02-12 18:40:02ajaksu2setnosy: + orsenthil
stage: test needed
versions: + Python 2.7, - Python 2.5
2009-01-14 17:57:31andrixsetmessages: + msg79867
2009-01-14 17:44:37vstinnersetmessages: + msg79864
2009-01-14 17:26:05andrixsetfiles: + urlparse.patch
keywords: + patch
messages: + msg79863
2009-01-14 17:08:40andrixsetfiles: + profile_urlparse.py
messages: + msg79862
2009-01-14 17:03:39vstinnersetmessages: + msg79861
2009-01-14 16:54:55andrixsetfiles: + optmizedurlparse.py
messages: + msg79860
2009-01-13 18:44:07vstinnersetnosy: + vstinner
messages: + msg79763
2009-01-13 17:32:35andrixcreate