classification
Title: urllib.parse docstrings incomplete
Type: Stage: patch review
Components: Documentation, Library (Lib) Versions: Python 3.9, Python 3.8, Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: docs@python Nosy List: Ido Michael, docs@python, nanjekyejoannah, orsenthil, syadlapalli, zach.ware
Priority: normal Keywords: newcomer friendly, patch

Created on 2019-08-28 16:26 by zach.ware, last changed 2019-09-28 10:53 by Ido Michael.

Pull Requests
URL Status Linked Edit
PR 16458 open Ido Michael, 2019-09-28 10:51
Messages (8)
msg350668 - (view) Author: Zachary Ware (zach.ware) * (Python committer) Date: 2019-08-28 16:26
For example, urlsplit:

>>> from urllib.parse import urlsplit
>>> help(urlsplit)
Help on function urlsplit in module urllib.parse:

urlsplit(url, scheme='', allow_fragments=True)
    Parse a URL into 5 components:
    <scheme>://<netloc>/<path>?<query>#<fragment>
    Return a 5-tuple: (scheme, netloc, path, query, fragment).
    Note that we don't break the components up in smaller bits
    (e.g. netloc is a single string) and we don't expand % escapes.


The current docstring does not describe the `scheme` or `allow_fragments` arguments.  Also, the note about not splitting netloc is misleading; the components of netloc (username, password, hostname, and port) are available as extra attributes of the returned SplitResult.

urlparse has similar issues; other functions could stand to be checked.
msg350798 - (view) Author: Joannah Nanjekye (nanjekyejoannah) * (Python committer) Date: 2019-08-29 16:31
> Also, the note about not splitting netloc is misleading; the components > of netloc (username, password, hostname, and port) are available as >extra attributes of the returned SplitResult.

Also, the docs in urllib.parse.rst should also be updated to correct this misleading statement.
msg351116 - (view) Author: sushma (syadlapalli) * Date: 2019-09-04 06:39
hello!

I can see that we might want to add documentation for splitting netloc, but I don't understand why we'd have scheme and netloc, but nothing for path and query. What are you suggesting we add for scheme/allow_fragements? 

Thanks!
msg351135 - (view) Author: Zachary Ware (zach.ware) * (Python committer) Date: 2019-09-04 14:44
>I don't understand why we'd have scheme and netloc, but nothing for path and query.

I'm not sure what you mean here; can you please clarify?

> What are you suggesting we add for scheme/allow_fragements?

Just a brief description of what effect the arguments actually have on the returned result.
msg351137 - (view) Author: sushma (syadlapalli) * Date: 2019-09-04 15:44
I guess what I'm wondering is this: 

urlsplit(url, scheme='', allow_fragments=True)
    Parse a URL into 5 components:
    <scheme>://<netloc>/<path>?<query>#<fragment>
    Return a 5-tuple: (scheme, netloc, path, query, fragment).
    Note that we don't break the components up in smaller bits
    (e.g. netloc is a single string) and we don't expand % escapes.
(END)

We don't have details regarding anything, i.e scheme, netloc, path or query or fragments. So I was curious about why we would have more documentation around netloc and scheme and nothing about path and query

Should we be adding information for all(scheme, netloc, path, query, fragment) of them, including extra attributes of the returned SplitResult? 

p.s - newbie trying to contribute here
msg351145 - (view) Author: Zachary Ware (zach.ware) * (Python committer) Date: 2019-09-04 16:40
I see.  I don't think we need to describe each returned component in the docstring; they're some combination of self-explanatory, described in the reference docs, or just industry-standard terms that are easily defined from other sources.  I'm not suggesting to add anything to the docstring about the `scheme` return component, but rather the *scheme* argument (which is the default value for the `scheme` return component when it's not found in the *url*).  The subcomponents of netloc should be mentioned because the docstring currently gives the impression that the user has to parse them out for themselves, which is not true.

Off the top of my head, I'd suggest changing the `urlsplit` docstring to something like:

```
Parse *url* and return a SplitResult.

SplitResult is a named 5-tuple of the following components:
<scheme>://<netloc>/<path>?<query>#<fragment>

The ``username``, ``password``, ``hostname``, and ``port``
sub-components of ``netloc`` can also be accessed as
attributes of the SplitResult object.

The *scheme* argument provides the default value of the
``scheme`` component when no scheme is found in *url*.

If *allow_fragments* is False, no attempt is made to
separate the ``fragment`` component from the previous
component, which can be either ``path`` or ``query``.

Note that % escapes are not expanded.
```
msg351338 - (view) Author: sushma (syadlapalli) * Date: 2019-09-08 19:55
got it - thanks for the detailed explanation! I'll go ahead and create a PR soon
msg353446 - (view) Author: Ido Michael (Ido Michael) * Date: 2019-09-28 10:53
Committed a PR: GH-16458

I've read all of the thread and changed the docstring to the latest suggestion by @zach.ware

Ido
History
Date User Action Args
2019-09-28 10:53:18Ido Michaelsetnosy: + Ido Michael
messages: + msg353446
2019-09-28 10:51:46Ido Michaelsetkeywords: + patch
stage: needs patch -> patch review
pull_requests: + pull_request16039
2019-09-08 19:55:13syadlapallisetmessages: + msg351338
2019-09-04 16:40:10zach.waresetnosy: + orsenthil
messages: + msg351145
2019-09-04 15:44:36syadlapallisetmessages: + msg351137
2019-09-04 14:44:35zach.waresetmessages: + msg351135
2019-09-04 06:39:43syadlapallisetnosy: + syadlapalli
messages: + msg351116
2019-08-29 16:31:00nanjekyejoannahsetnosy: + nanjekyejoannah
messages: + msg350798
2019-08-28 16:26:39zach.warecreate