This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vincentk
Recipients dalke, ijmorlan, jjlee, paulj, skip.montanaro, vincentk
Date 2007-11-21.10:29:36
SpamBayes Score 0.20937921
Marked as misclassified No
Message-id <1195640977.01.0.739150978071.issue1462525@psf.upfronthosting.co.za>
In-reply-to
Content
Some more notes. 
a) RFC3986 explicitly states that the presented regex (which you use)
   """ is the regular expression for breaking-down a *well-formed* URI
reference into its components. """ (Emphasis added). I am not sure this
is a particularly good starting point for parsing potentially
security-critical data.

b) The parser fails on URI's containing numerical IPv6 addresses (e.g.
"http://[::1]:88/path"). Specifically, the following code in
split_authority is broken:

    if hostport and ':' in hostport:
        host, port = hostport.split(':', 1)

Clearly, if the authority may contain a ":" in the host's IP field, you
cannot simply split() off the port part.

Again, I am afraid I have no simple solution. Hate to sound so negative.

Kind regards,
v.
History
Date User Action Args
2007-11-21 10:29:37vincentksetspambayes_score: 0.209379 -> 0.20937921
recipients: + vincentk, skip.montanaro, jjlee, dalke, paulj, ijmorlan
2007-11-21 10:29:37vincentksetspambayes_score: 0.209379 -> 0.209379
messageid: <1195640977.01.0.739150978071.issue1462525@psf.upfronthosting.co.za>
2007-11-21 10:29:37vincentklinkissue1462525 messages
2007-11-21 10:29:36vincentkcreate