classification
Title: recursive urlparse
Type: enhancement Stage: test needed
Components: Library (Lib) Versions: Python 3.1, Python 2.7
process
Status: closed Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: eric.araujo, georg.brandl, jjlee, orsenthil, r.david.murray, techtonik
Priority: normal Keywords: easy

Created on 2007-01-24 10:23 by techtonik, last changed 2010-07-01 21:37 by r.david.murray. This issue is now closed.

Messages (9)
msg61268 - (view) Author: anatoly techtonik (techtonik) Date: 2007-01-24 10:23
urlparse module is incomplete. there is no convenient high-level function to parse url down into atomic chunks, urldecode query and bring it to array (or dictionary for that case), so that you can modify that dictionary and reassemble it into query again using nothing more than simple array manipulations.

This kind of function is universal and flexible in the same way that low-level API, but in comparison it allows to considerably speed up development process if the speech is about urls

I propose urlparseex(urlstring) function that will dissect the URL into dictionary of appropriate dictionaries or strings and decode all % entities

scheme  0  	string
netloc 	1 	dictionary
	username 1.1 string or whatever
	password 1.2 string or whatever
 	server 	1.3 	hostname string
 	port 	1.4 	port integer
path 	2 	string
params 	3 	ordered dictionary of path components for the sake of reassembling them later (sorry, I have little pythons in my head to replace "ordered dictionary" with something more appropriate) where respective path part entry is also dictionary of parameters
query 	4 	dictionary
fragment 	5 	string


there must be also counterpart urlunparseex(dictionary) to reassemble url and reencode entities


Reasons behind the decision:
- 90% of time you need to decode % entities - this must be made by default (whoever need to let them encoded are in minor and may use other functions)
- atomic recursion format is needed to be able to easily change any url component and reassemble it back
- get simple swiss-army knife for high-level (read - logical) url operations in one module

http://docs.python.org/lib/module-urlparse.html

There is also this proposal below. It is a little bit different, but shows that after four years url  handling problems are still actual. 

http://sourceforge.net/tracker/index.php?func=detail&aid=600362&group_id=5470&atid=355470
msg99424 - (view) Author: anatoly techtonik (techtonik) Date: 2010-02-16 17:41
The last SF link is issue 600362
msg108959 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2010-06-30 02:47
This is already handled via namedtuple in the urlparse. All the parts of the url are available by parsing.
msg108962 - (view) Author: anatoly techtonik (techtonik) Date: 2010-06-30 03:44
Senthil, please read the proposals more attentively. From the docs of urlparse at http://docs.python.org/library/urlparse.html

"The components are not broken up in smaller parts (for example, the network location is a single string), and % escapes are not expanded."
msg108972 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-06-30 10:58
Since no patch has been proposed since 2007, I think it is time to close this feature request for lack of interest.

In any case I think this functionality would be better situated in a Python3 URI/IRI parsing module with a full object model for the IRI, which is something complicated enough that it may need some time on PyPI before getting in to the standard library.
msg109064 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2010-07-01 18:19
David, Is the stage "unit test needed" proper for this or was it by mistake?

Anatoly, I thought closing this feature request was fine, because I considered that with namedtuple the desired attributes of url's were obtained as ParsedTuple object (check test_urlsplit_attributes in test_urlparse.py). But as you pointed out, I can see that the docs can be improved further.
 
Your suggested approach of dictionary is bit different than the way it is currently implemented, a patch might have helped for evaluation.
msg109073 - (view) Author: anatoly techtonik (techtonik) Date: 2010-07-01 20:58
Too bad that request from users who are not eligible to produce a patch are not accepted by Python "community". =/
msg109076 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2010-07-01 21:31
Why shouldn't you be eligible to produce patches to Python?

And yes, requests without patches will sometimes take longer, or be
evaluated differently, since we're all volunteers here, and an existing
patch, even if unusable it the submitted form, often makes working on
a request much more straightforward.

Regarding your ironic quoting of the word "community" -- do not forget
that you are part of the community, and what we are doing here is
exactly what a community does as compared to a company: helping each
other, not because of payment, but because we care for what we do.
Please do not subvert that commitment.
msg109078 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-07-01 21:37
Anatoly, when I said I was closing the issue for lack of interest, I meant that you had not produced a candidate patch, and no one else had shown any interest in creating one.  If you wish to produce a candidate patch we can reopen the issue (though I do think a full blow URI/IRI module would be better).
History
Date User Action Args
2010-07-01 21:37:29r.david.murraysetmessages: + msg109078
2010-07-01 21:33:36eric.araujosetnosy: + eric.araujo
2010-07-01 21:31:49georg.brandlsetnosy: + georg.brandl
messages: + msg109076
2010-07-01 20:58:53techtoniksetmessages: + msg109073
2010-07-01 18:19:24orsenthilsetmessages: + msg109064
2010-06-30 10:58:31r.david.murraysetstatus: open -> closed

nosy: + r.david.murray
messages: + msg108972

resolution: out of date ->
stage: resolved -> test needed
2010-06-30 03:44:34techtoniksetstatus: closed -> open

messages: + msg108962
2010-06-30 02:47:38orsenthilsetstatus: open -> closed
resolution: out of date
messages: + msg108959

stage: test needed -> resolved
2010-02-16 17:41:18techtoniksetmessages: + msg99424
2009-05-01 11:04:57orsenthilsetnosy: + orsenthil
2009-04-22 17:25:38ajaksu2setkeywords: + easy
2009-02-13 01:36:42ajaksu2setnosy: + jjlee
stage: test needed
versions: + Python 3.1, Python 2.7, - Python 2.6
2007-01-24 10:23:00techtonikcreate