Title: urllib.splitport -- is it official or not?
Type: Stage:
Components: Versions: Python 3.6, Python 3.5, Python 2.7
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: brett.cannon, csabella, gvanrossum, martin.panter, orsenthil, serhiy.storchaka
Priority: normal Keywords:

Created on 2016-07-11 18:02 by gvanrossum, last changed 2017-06-14 23:34 by csabella.

Pull Requests
URL Status Linked Edit
PR 2205 open csabella, 2017-06-14 23:25
Messages (12)
msg270193 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2016-07-11 18:02
I've seen and written some code that uses urllib.splitport() [1], but it's not in the export list, nor in the docs. However I see no easy other way to perform the same function. Should we make it official, or get rid of it? It's used internally in urllib/ [2]. There's a test for it in [3], but another test [4] also acknowledges that it's "undocumented" (which suggests that the author of that test didn't know what to do with it either).

Same question for the others in that list [4]:
            'splitattr', 'splithost', 'splitnport', 'splitpasswd',
            'splitport', 'splitquery', 'splittag', 'splittype', 'splituser',
            'Quoter', 'ResultBase', 'clear_cache', 'to_bytes', 'unwrap',

msg270200 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-07-11 19:08
splitport() doesn't work with IPv6 ("[::1]", see issue18191), nor with authority (""). Note that there is a almost duplicate function splitnport(). The existence of two similar functions that behave differently in corner cases looks confusing. And seems splitport() and splitnport() not always used correctly internally (see issue20271).
msg270215 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-07-11 23:33
Previous discussion: Issue 1722, Issue 11009.

In Python 2, most of the split- functions _have_ been in urllib.__all__ since revision 5d68afc5227c (2.1). Also, since revision c3656dca65e7 (Issue 1722, 2.7.4), the RST documentation does mention that at least some of them are deprecated in favour of the “urlparse” module. However there are no index entries, and splitport() is not mentioned by name.

In Python 3, these functions wandered into urllib.parse. There is no RST documentation, and the functions are not in __all__ (which was added for Issue 13287 in 3.3).

I think you can use the documented urllib.parse API instead of splitport(), but it is borderline unwieldy:

>>> netloc = "[::1]:80"
>>> urllib.parse.splitport(netloc)  # [Brackets] kept!
('[::1]', '80')
>>> split = urlsplit("//" + netloc); (split.hostname, split.port)
('::1', 80)
>>> split = SplitResult("", netloc, path="", query="", fragment=""); (split.hostname, split.port)
('::1', 80)

I opened Issue 23416 with a suggestion that would make SplitResult a bit simpler to use here. But maybe it makes the implementation too complicated.

I don’t think the non-split-names (Quoter, etc) are in much doubt. They were never in __all__.
msg270218 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2016-07-12 00:51
Aha. I see you are referring to this note in the 2.7 docs for urllib:

    urllib also exposes certain utility functions like splittype, splithost and
    others parsing URL into various components. But it is recommended to use
    :mod:`urlparse` for parsing URLs rather than using these functions directly.
    Python 3 does not expose these helper functions from :mod:`urllib.parse`

This is somewhat ironic because those functions still exist in urllib.parse.

I've rewritten my code using your suggestions of using urllib.parse.urlparse().

Shall we just close this issue or is there still an action item? (Maybe actually delete those functions whose deletion has been promised so long ago, or at least rename them to _splitport() etc.?)
msg270253 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2016-07-12 16:03
Probably a rename is good. Question then becomes whether the old names should raise an DeprecationWarning for a release?
msg270536 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2016-07-16 04:05
I think that we use encourage everyone to use the higher level functions like urlparse() or urlsplit() and then get the .port from the named tuple result.

Two things to do.

1. Update that Note the documentation which states a false statement that those helper functions are not exposed in urllib.parse

2. Raise a DeprecationWarning and rename them to _splitport()
msg295962 - (view) Author: Cheryl Sabella (csabella) * Date: 2017-06-13 22:03
Would it be OK for me to work on this?
msg295965 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2017-06-13 22:22
Go for it, Cheryl!
msg295966 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2017-06-13 22:26
Skimming the issue I can't even figure out what the task is -- Cheryl, I suppose you have, could you post a brief summary of your plan here?
msg295967 - (view) Author: Cheryl Sabella (csabella) * Date: 2017-06-13 22:59
Thank you.

From my understanding, urllib didn't officially supported the split* functions (splittype, splithost, splitport, splinport, splituser, splitpasswd, splitattr, splitquery, splitvalue, splittag) even though they were migrated to urllib.parse.  The 2.7 documentation recommended using urlparse and stated that these split* functions would not be exposed in Python 3, but they are.

The proposal would be as Senthil suggested - to raise a DeprecationWarning if the current names are used and to rename them with a single underscore (to _split*).

However, I did have some questions.  
1. With the DeprecationWarning for the current function names, should the return value be a call to the _split* function or should it call urlparse/urlsplit?
2. Some of the return values from these split* functions can be different than urlsplit, as Martin showed in msg270215.  It seems that the return values should remain the same for now, but would the differences need to be documented? 
3. These functions are used in  Should they be changed to use the _split* functions or changed to use urlparse?
msg295983 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2017-06-14 04:48
I don't think it is worth changing the implementations to be in terms of urlsplit or urlparse. This is proposed for splithost in <>, but I suspect it would change the behaviour in some corner cases. See Issue 22852 for some deficiencies with urlsplit.

3. Change existing usage to the internal _split functions, unless there is a reason (bug or security problem) to make further changes.
msg296049 - (view) Author: Cheryl Sabella (csabella) * Date: 2017-06-14 23:34
Martin, thank you for the information and for pointing out those other related issues.  It makes sense to separate the security or bug issues from this change.
Date User Action Args
2017-06-14 23:34:10csabellasetmessages: + msg296049
2017-06-14 23:25:20csabellasetpull_requests: + pull_request2249
2017-06-14 04:48:57martin.pantersetmessages: + msg295983
2017-06-13 22:59:11csabellasetmessages: + msg295967
2017-06-13 22:26:06gvanrossumsetmessages: + msg295966
2017-06-13 22:22:59brett.cannonsetmessages: + msg295965
2017-06-13 22:03:49csabellasetnosy: + csabella
messages: + msg295962
2016-07-16 04:05:27orsenthilsetnosy: + orsenthil

messages: + msg270536
versions: - Python 3.2, Python 3.3, Python 3.4
2016-07-12 16:03:48brett.cannonsetnosy: + brett.cannon
messages: + msg270253
2016-07-12 00:51:34gvanrossumsetmessages: + msg270218
2016-07-11 23:33:39martin.pantersetnosy: + martin.panter
messages: + msg270215
2016-07-11 19:08:01serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg270200
2016-07-11 18:02:03gvanrossumcreate