Issue1500504
Created on 2006-06-04 14:50 by ncoghlan, last changed 2009-04-23 02:06 by orsenthil.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | Remove |
| urischemes.py | ncoghlan, 2006-06-08 12:11 | v 0.4 of the urischemes module | ||
| Messages (9) | |||
|---|---|---|---|
| msg50411 - (view) | Author: Nick Coghlan (ncoghlan) | Date: 2006-06-04 14:50 | |
Inspired by (and based on) Paul Jimenez's uriparse module (http://python.org/sf/1462525), urischemes tries to put a cleaner interface in front of the URI parsing engine. Most of the module works with a URI subclass of tuple that is always a 5-tuple (scheme, authority, path, query, fragment). The authority component is either None, or a URIAuthority subclass of tuple that is always a 4-tuple (user, password, host, port). The function make_uri will create a URI string from the 5 constituent components of a URI. The components do not need to be strings - if they are not strings, str() will be invoked on them (this allows the URIAuthority tuple subclass to be used transparently instead of a string for the authority component). The result is checked to ensure it is an RFC-compliant URI. The function split_uri accepts a string and returns a URI object with strings as the individual elements. Invoking str() on this object will recreate a URI string using make_uri(). The regex underlying this operation is now broken out and available as module level attributes like URI_PATTERN. The functions split_authority and make_authority are similar, only working solely on the authority component rather than the whole URI. The function parse_uri digs into the internal structure of a URI, also parsing the components. This will replace a non-empty URI authority component string with a URIAuthority tuple subclass. Depending on the scheme, it may also replace other components (e.g. for mailto links, the path is replaced with a (user, host) tuple subclass). The main parsing engine is still URIParser (much the same as Paul's), but the root of the internal parser hierarchy is now SchemeParser. This has two subclasses, URLParser and MailtoParser. The various URL flavours are now different instances of URLParser rather than subclasses. All of the actual parsers are available as module level attributes with the same name as the scheme they parse. Additionally, each parser knows the name of the scheme it is intended to parse. The parse() methods of the individual parsers are now expected to return a URI object (SchemeParser actually takes care of this). The parse() method also takes a dictionary of defaults, which can override the defaults supplied by the parser instance. The unparse() method is gone - instead, the scheme parser should ensure that all components returned are either strings or produce the right thing when __str__ is invoked (e.g. see _MailtoURIPath) The module level 'schemes' attribute is a mapping from scheme names to parsers that is automatically populated with all instances of SchemeParser that are found in the module globals() urljoin has been renamed to join_uri to match the style of the other names in the module. |
|||
| msg50412 - (view) | Author: Nick Coghlan (ncoghlan) | Date: 2006-06-05 13:53 | |
Logged In: YES user_id=1038590 Updated version attached which addresses some issues raised by Mike Brown in private mail (the difference between a URI and a URI reference and some major differences between URI paths and posix paths). Also settled on split/join for the component separation and recombination operations and made the join methods all take a tuple so that join_x(split_x(uri)) round trips. Based on the terminology in the RFC, the function to combine a URI reference with a base URI is now called "resolve_uriref". |
|||
| msg50413 - (view) | Author: Nick Coghlan (ncoghlan) | Date: 2006-06-06 15:46 | |
Logged In: YES user_id=1038590 Uploaded version 0.3 which passes all the RFC tests, as well as the failing 4Suite tests Mike sent me based on version 0.1 and 0.2. The last 4suite failure went away when I realised those tests expected to operate in strict mode :) |
|||
| msg50414 - (view) | Author: Nick Coghlan (ncoghlan) | Date: 2006-06-08 12:11 | |
Logged In: YES user_id=1038590 Uploaded version 0.4 This version cleans up the logic in resolve_uripath a bit (use a separate loop to strip the leading dot segments, add comments explaining meaning of if statements when dealing with dot segments). It also exposes EmailPath (along with split_emailpath and join_emailpath) as public objects, rather than treating them as internal to the MailtoSchemeParser. |
|||
| msg50415 - (view) | Author: Nick Coghlan (ncoghlan) | Date: 2007-02-14 09:11 | |
Removed all versions prior to 0.4 |
|||
| msg83920 - (view) | Author: Daniel Diniz (ajaksu2) | Date: 2009-03-21 03:38 | |
I'll collect open issues that would be solved by this. |
|||
| msg86301 - (view) | Author: Nick Coghlan (ncoghlan) | Date: 2009-04-22 15:38 | |
The code itself is no longer the hard part here (hence the easy tag). The problem is the fact that getting something like this into the standard library is a tough sell on python-dev because it isn't really a field tested module, but once people start downloading things from PyPI, they're more likely to go for something like 4Suite rather than a mere URI parsing module. What the issue really needs is someone to champion the benefits of having this in the standard library. Now that it is available, it would also be worth looking at updating the module to use collection.named_tuple instead of creating its own variant of the same thing. |
|||
| msg86308 - (view) | Author: Daniel Diniz (ajaksu2) | Date: 2009-04-22 17:24 | |
ISTM that gathering the issues where this would help is a good start, but I haven't had the time to do it yet. |
|||
| msg86348 - (view) | Author: Senthil Kumaran (orsenthil) | Date: 2009-04-23 02:06 | |
I am willing to review this/work on it. But I wonder if this can be categorized as easy task. 1) Integration to Standard Library will involve compatibility with existing parsing, which will invariably involve certain tweaks (with discussions/buy-in from others). 2) There are other patches which tries to achieve this purpose; consolidation is required. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2009-04-23 02:06:20 | orsenthil | set | messages: + msg86348 |
| 2009-04-22 17:24:49 | ajaksu2 | set | dependencies:
+ URI parsing library messages: + msg86308 |
| 2009-04-22 15:38:48 | ncoghlan | set | messages: + msg86301 |
| 2009-04-22 13:47:52 | orsenthil | set | nosy:
+ orsenthil |
| 2009-04-22 12:47:03 | ajaksu2 | set | keywords: + easy |
| 2009-03-21 03:38:58 | ajaksu2 | set | nosy:
+ ajaksu2 versions: + Python 3.1, Python 2.7 messages: + msg83920 components: + Library (Lib), - None type: feature request stage: patch review |
| 2006-06-04 14:50:18 | ncoghlan | create | |