Message 61030 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	dalke
Recipients
Date	2006-11-05.22:27:28
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to

Content
urlparse implements RFC 1808. That is strongly out of date. The most recent is RFC 3986. Here is a text from 4Suite # Reasons to avoid using urllib.basejoin() and urlparse.urljoin(): # - Both are partial implementations of long-obsolete specs. # - Both accept relative URLs as the base, which no spec allows. # - urllib.basejoin() mishandles the '' and '..' references. # - If the base URL uses a non-hierarchical or relative path, # or if the URL scheme is unrecognized, the result is not # always as expected (partly due to issues in RFC 1808). # - If the authority component of a 'file' URI is empty, # the authority component is removed altogether. If it was # not present, an empty authority component is in the result. # - '.' and '..' segments are not always collapsed as well as they # should be (partly due to issues in RFC 1808). # - Effective Python 2.4, urllib.basejoin() is urlparse.urljoin(), # but urlparse.urljoin() is still based on RFC 1808. See also the back python-dev discussions on "urlparse" for examples of people wanting a better/more up-to-date urlparse/urljoin.

urlparse implements RFC 1808.  That is strongly out of
date.  The most recent is RFC 3986.

Here is a text from 4Suite

    # Reasons to avoid using urllib.basejoin() and
urlparse.urljoin():
    # - Both are partial implementations of
long-obsolete specs.
    # - Both accept relative URLs as the base, which no
spec allows.
    # - urllib.basejoin() mishandles the '' and '..'
references.
    # - If the base URL uses a non-hierarchical or
relative path,
    #    or if the URL scheme is unrecognized, the
result is not
    #    always as expected (partly due to issues in
RFC 1808).
    # - If the authority component of a 'file' URI is
empty,
    #    the authority component is removed altogether.
If it was
    #    not present, an empty authority component is
in the result.
    # - '.' and '..' segments are not always collapsed
as well as they
    #    should be (partly due to issues in RFC 1808).
    # - Effective Python 2.4, urllib.basejoin() *is*
urlparse.urljoin(),
    #    but urlparse.urljoin() is still based on RFC 1808.

See also the back python-dev discussions on "urlparse"
for examples of people wanting a better/more up-to-date
urlparse/urljoin.

History
Date	User	Action	Args
2008-01-20 09:59:05	admin	link	issue1591035 messages
2008-01-20 09:59:05	admin	create