This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: urllib.parse docstrings incomplete
Type: Stage: patch review
Components: Documentation, Library (Lib) Versions: Python 3.9, Python 3.8, Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: docs@python Nosy List: Ido Michael, docs@python, nanjekyejoannah, orsenthil, syadlapalli, taleinat, zach.ware
Priority: normal Keywords: newcomer friendly, patch

Created on 2019-08-28 16:26 by zach.ware, last changed 2022-04-11 14:59 by admin.

Pull Requests
URL Status Linked Edit
PR 16458 merged Ido Michael, 2019-09-28 10:51
PR 18631 open Ido Michael, 2020-02-23 23:20
Messages (16)
msg350668 - (view) Author: Zachary Ware (zach.ware) * (Python committer) Date: 2019-08-28 16:26
For example, urlsplit:

>>> from urllib.parse import urlsplit
>>> help(urlsplit)
Help on function urlsplit in module urllib.parse:

urlsplit(url, scheme='', allow_fragments=True)
    Parse a URL into 5 components:
    <scheme>://<netloc>/<path>?<query>#<fragment>
    Return a 5-tuple: (scheme, netloc, path, query, fragment).
    Note that we don't break the components up in smaller bits
    (e.g. netloc is a single string) and we don't expand % escapes.


The current docstring does not describe the `scheme` or `allow_fragments` arguments.  Also, the note about not splitting netloc is misleading; the components of netloc (username, password, hostname, and port) are available as extra attributes of the returned SplitResult.

urlparse has similar issues; other functions could stand to be checked.
msg350798 - (view) Author: Joannah Nanjekye (nanjekyejoannah) * (Python committer) Date: 2019-08-29 16:31
> Also, the note about not splitting netloc is misleading; the components > of netloc (username, password, hostname, and port) are available as >extra attributes of the returned SplitResult.

Also, the docs in urllib.parse.rst should also be updated to correct this misleading statement.
msg351116 - (view) Author: sushma (syadlapalli) * Date: 2019-09-04 06:39
hello!

I can see that we might want to add documentation for splitting netloc, but I don't understand why we'd have scheme and netloc, but nothing for path and query. What are you suggesting we add for scheme/allow_fragements? 

Thanks!
msg351135 - (view) Author: Zachary Ware (zach.ware) * (Python committer) Date: 2019-09-04 14:44
>I don't understand why we'd have scheme and netloc, but nothing for path and query.

I'm not sure what you mean here; can you please clarify?

> What are you suggesting we add for scheme/allow_fragements?

Just a brief description of what effect the arguments actually have on the returned result.
msg351137 - (view) Author: sushma (syadlapalli) * Date: 2019-09-04 15:44
I guess what I'm wondering is this: 

urlsplit(url, scheme='', allow_fragments=True)
    Parse a URL into 5 components:
    <scheme>://<netloc>/<path>?<query>#<fragment>
    Return a 5-tuple: (scheme, netloc, path, query, fragment).
    Note that we don't break the components up in smaller bits
    (e.g. netloc is a single string) and we don't expand % escapes.
(END)

We don't have details regarding anything, i.e scheme, netloc, path or query or fragments. So I was curious about why we would have more documentation around netloc and scheme and nothing about path and query

Should we be adding information for all(scheme, netloc, path, query, fragment) of them, including extra attributes of the returned SplitResult? 

p.s - newbie trying to contribute here
msg351145 - (view) Author: Zachary Ware (zach.ware) * (Python committer) Date: 2019-09-04 16:40
I see.  I don't think we need to describe each returned component in the docstring; they're some combination of self-explanatory, described in the reference docs, or just industry-standard terms that are easily defined from other sources.  I'm not suggesting to add anything to the docstring about the `scheme` return component, but rather the *scheme* argument (which is the default value for the `scheme` return component when it's not found in the *url*).  The subcomponents of netloc should be mentioned because the docstring currently gives the impression that the user has to parse them out for themselves, which is not true.

Off the top of my head, I'd suggest changing the `urlsplit` docstring to something like:

```
Parse *url* and return a SplitResult.

SplitResult is a named 5-tuple of the following components:
<scheme>://<netloc>/<path>?<query>#<fragment>

The ``username``, ``password``, ``hostname``, and ``port``
sub-components of ``netloc`` can also be accessed as
attributes of the SplitResult object.

The *scheme* argument provides the default value of the
``scheme`` component when no scheme is found in *url*.

If *allow_fragments* is False, no attempt is made to
separate the ``fragment`` component from the previous
component, which can be either ``path`` or ``query``.

Note that % escapes are not expanded.
```
msg351338 - (view) Author: sushma (syadlapalli) * Date: 2019-09-08 19:55
got it - thanks for the detailed explanation! I'll go ahead and create a PR soon
msg353446 - (view) Author: Ido Michael (Ido Michael) * Date: 2019-09-28 10:53
Committed a PR: GH-16458

I've read all of the thread and changed the docstring to the latest suggestion by @zach.ware

Ido
msg361776 - (view) Author: Ido Michael (Ido Michael) * Date: 2020-02-11 03:54
Any update on this? Adding @Tal Einat on the PR
msg361777 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2020-02-11 04:03
Hi Ido, there was a change requested by a core-dev, Zachary , on your PR.

>  Please have a look at PEP 257 for docstring formatting guidelines.

https://github.com/python/cpython/pull/16458/files#r353422155

Please let us know if that is addressed.
msg361779 - (view) Author: Ido Michael (Ido Michael) * Date: 2020-02-11 04:12
Hey Senthil,
Yes the PEP guides was fixed a while ago, also the new comment of adding the same change for the second function were taken care of.
msg362094 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2020-02-16 19:18
New changeset c33bdbb20cf55b3a2aa7a91bd3d91fcb59796fad by idomic in branch 'master':
bpo-37970: update and improve urlparse and urlsplit doc-strings (GH-16458)
https://github.com/python/cpython/commit/c33bdbb20cf55b3a2aa7a91bd3d91fcb59796fad
msg362096 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2020-02-16 19:22
Ido Michael's PR GH-16458, which I've just merged, addresses the issues brought up here regarding the doc-strings.

From the discussion it seems that the documentation of these functions should be updated as well. I'll leave this issue open until that's done as well.
msg362102 - (view) Author: Ido Michael (Ido Michael) * Date: 2020-02-16 20:56
Tal, I can also fix that so we can close this issue.
Are you talking about the Doc/library/urllib.parse.rst file?
msg362536 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2020-02-23 20:06
Indeed.
msg362550 - (view) Author: Ido Michael (Ido Michael) * Date: 2020-02-23 23:22
Created PR, most of the documentation besides the func signature looked alright: GH-18631
History
Date User Action Args
2022-04-11 14:59:19adminsetgithub: 82151
2020-02-23 23:22:55Ido Michaelsetmessages: + msg362550
2020-02-23 23:20:42Ido Michaelsetpull_requests: + pull_request17997
2020-02-23 20:06:32taleinatsetmessages: + msg362536
2020-02-16 20:56:53Ido Michaelsetmessages: + msg362102
2020-02-16 19:22:23taleinatsetmessages: + msg362096
2020-02-16 19:18:07taleinatsetnosy: + taleinat
messages: + msg362094
2020-02-11 04:12:40Ido Michaelsetmessages: + msg361779
2020-02-11 04:03:13orsenthilsetmessages: + msg361777
2020-02-11 03:54:09Ido Michaelsetmessages: + msg361776
2019-09-28 10:53:18Ido Michaelsetnosy: + Ido Michael
messages: + msg353446
2019-09-28 10:51:46Ido Michaelsetkeywords: + patch
stage: needs patch -> patch review
pull_requests: + pull_request16039
2019-09-08 19:55:13syadlapallisetmessages: + msg351338
2019-09-04 16:40:10zach.waresetnosy: + orsenthil
messages: + msg351145
2019-09-04 15:44:36syadlapallisetmessages: + msg351137
2019-09-04 14:44:35zach.waresetmessages: + msg351135
2019-09-04 06:39:43syadlapallisetnosy: + syadlapalli
messages: + msg351116
2019-08-29 16:31:00nanjekyejoannahsetnosy: + nanjekyejoannah
messages: + msg350798
2019-08-28 16:26:39zach.warecreate