This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author techtonik
Recipients techtonik
Date 2007-10-26.12:16:26
SpamBayes Score 0.0012905326
Marked as misclassified No
Message-id <1193400988.05.0.68480909797.issue1333@psf.upfronthosting.co.za>
In-reply-to
Content
The purpose is to encapsulate all URL handling functions in one module.
At the moment there are three modules that dissect URLs for various bits
of information. These are urlparse - to split url into components,
urllib - to decode splitted data and cgi - to parse query component.

To outline the API of the proposed module I'll start with urlparse :
http://docs.python.org/lib/module-urlparse.html

1. There are two identical functions - urlparse and urlsplit that make
the same parsing operation, but vary in format of return arguments. They
could be replaced with one - let's call it urlsplitex - that returns
result in a class with attributes - not a subclass of list, but rather
dictionary subclass, because positional arguments are evil and you
always have to look into reference to find out the correct order if you
read or debug the code.

2. Returned class should not be immutable. It must be possible to modify
the results to unset extra parts (like fragments) or edit required parts
as needed and get the target URL via urlunsplitex or embedded method of
the same class. Thus arguments "default_scheme" and "allow_fragments"
will be useless as well as function urldefrag.

3. urlparsex, a replacement for "parsing" function of the new library
should be high-level functions to dissect url information into tree-like
structure with atomic leafs. This includes decoding url entities and
splitting parameters into child structures. The proposed structure of
url class attributes is:

scheme       string
netloc       class
       username  string
       password  string
       server    string
       port      integer
path         list with objects of class
       part      string
       param     list with objects of class
           name     string
           value    string
query        list with objects of class
       name      string
       value     string
fragment     string


4. urlunparsex will be provided to reassemble class back into URL. This
will deprecate series of functions from urllib like quote, unquote,
urlencode and also functions parse_qs and parse_qsl from cgi module.

References:
http://mail.python.org/pipermail/patches/2005-February/016972.html
http://bugs.python.org/issue1722348
http://bugs.python.org/issue1462525
History
Date User Action Args
2007-10-26 12:16:28techtoniksetspambayes_score: 0.00129053 -> 0.0012905326
recipients: + techtonik
2007-10-26 12:16:28techtoniksetspambayes_score: 0.00129053 -> 0.00129053
messageid: <1193400988.05.0.68480909797.issue1333@psf.upfronthosting.co.za>
2007-10-26 12:16:27techtoniklinkissue1333 messages
2007-10-26 12:16:26techtonikcreate