This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author demian.brecht
Recipients berker.peksag, demian.brecht, madison.may, martin.panter, mher, orsenthil
Date 2015-03-19.15:36:22
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <A3D8E167-9A2C-4C46-90C9-AACA508B48FF@gmail.com>
In-reply-to <1426749917.66.0.97983354883.issue18828@psf.upfronthosting.co.za>
Content
>>>> urljoin('mailto:foo@', 'bar.com')
> 'mailto:bar.com'
> 
> which seems fairly sensible to me.

This is where joining arbitrary protocols gets tricky. Does it make sense to merge non-hierarchical protocols such as mailto? My initial reaction is "no" and what should actually happen here is one of two things:

1. The result is a simple concatenation: "mailto:foo@bar.com".
2. An exception is raised indicating that urljoin cannot determine how to handle merging base and url.

The above could happen in cases where either scheme is None for both base and url or the scheme to be used is any of urllib.parse.non_hierarchical.

> A more awkward question is if this behaviour of my patch is reasonable:
> 
>>>> urljoin('mailto:person-foo/bar@example.net', 'bar.com')
> 'mailto:person-foo/bar.com'

A couple thoughts on this: If urllib.parse.non_hierarchical is used to determine merge vs. simple concat (or exception), this specific case won't be an issue. Also, according to 6068, "mailto:person-foo/bar@example.net' is invalid (the "/" should be percent-encoded), but I don't think it should be the job of urljoin to understand the URI structures of each protocol, outside of logically join base and url.

> Yet another option, similar to my “any_scheme=True” flag, might be to change from the “uses_relative” white-list to a “not_relative” black-list of URL schemes, so that urljoin() works for arbitrary schemes except for ones like “mailto:” that are in the hard-coded list.

This list may already be present in urllib.parse.non_hierarchical. I also think it's worthwhile to do some further research against http://www.iana.org/assignments/uri-schemes/uri-schemes.xhtml to ensure the list is up to date.

If this path is chosen, I would suggest getting sign off from a couple core devs prior to investing time in this as all changes discussed so far are backwards incompatible.
History
Date User Action Args
2015-03-19 15:36:23demian.brechtsetrecipients: + demian.brecht, orsenthil, mher, berker.peksag, martin.panter, madison.may
2015-03-19 15:36:23demian.brechtlinkissue18828 messages
2015-03-19 15:36:22demian.brechtcreate