classification
Title: unclear documentation on Queue.qsize()
Type: Stage:
Components: Documentation Versions: Python 3.5
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: Doug Hoskisson, docs@python, r.david.murray, rhettinger, vstinner
Priority: normal Keywords:

Created on 2016-07-26 13:00 by Doug Hoskisson, last changed 2016-07-29 14:13 by r.david.murray. This issue is now closed.

Messages (16)
msg271362 - (view) Author: Doug Hoskisson (Doug Hoskisson) Date: 2016-07-26 13:00
The documentation for Queue.qsize():

"Return the approximate size of the queue."

"approximate" is unclear. It might suggest some strategy used for approximating, or it might be the exact size at an arbitrary time.
It should be made more clear.
msg271386 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2016-07-26 16:01
Since we're talking about multi-threaded operations, the concept of "exact size at an arbitrary time" isn't operationally different from "a strategy used for approximating".  The subsequent text clarifies what "approximately" means operationally.  Specifying it further would be, I think, overspecification.
msg271404 - (view) Author: Doug Hoskisson (Doug Hoskisson) Date: 2016-07-26 18:51
Some strategies for approximating might report a size the the queue has never been and never will be. For example, a strategy could gather data and find the size is increasing at some rate, and approximate based on that rate, but then the rate of increase changes before it reaches the approximated size. That's the kind of thing that "approximate" would suggest to some people.
msg271405 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2016-07-26 18:57
What if we just replaced the period with a colon?  That is, the definition of "approximate" is the two rules in the second sentence.
msg271409 - (view) Author: Doug Hoskisson (Doug Hoskisson) Date: 2016-07-26 19:13
The way that this whole page of documentation is written does not suggest that this class is ONLY for use in a multi-threaded setting.

This class can be used without multi-threading, right?

Wouldn't it be useful to know whether this function does give the exact size of the queue in a single-threaded setting?

Right now, it doesn't contain that information.
msg271415 - (view) Author: Doug Hoskisson (Doug Hoskisson) Date: 2016-07-26 21:29
One thing that is important to recognize in considering this, is which information is specific to what is being documented, and which information is more general.

Some people may think that documentation should only give information specific to what is being documented. Others may think it is useful to also include general information that can help people learn.

I don't know whether the writers of Python documentation lean to one of these or the other, but this contains a significant amount of information that has nothing to do with Python specifically, nothing to do with this class specifically, and nothing to do with this function specifically. (Again, I'm not saying this is bad. I just think it's important for people to recognize it.)

It's just general multi-threading knowledge. Anyone who knows about multi-threading (in any language) knows that the queue could change between two function calls.

But despite that extra general information, there is some specific information missing. Does it return the size of the queue (at the time the memory is accessed by the function call)? or does it use a more complex strategy for approximating the size of the queue? The reason this information is important is that if it is the former, that would be useful in single-threaded situations.

I am guessing that it is the former, but I don't know because not enough information is given.

Assuming that guess, I think following the model I see in the documentation of the next 2 functions on the page (Queue.empty() and Queue.full()) would be a good idea. That is, that the first sentence should only contain information specific to what is being documented, and more general information (about multi-threading) can be given afterward.

The fact that the size returned is approximate would have nothing to do with this function specifically, and it is just general information about how multi-threading works.

My suggestion for this documentation (again, assuming that my guess of the missing information is correct) I will put in a separate comment because this comment will be TLDR for many.

If my guess is incorrect, then something should be clarified to lessen people guessing thus. (Maybe this is just projecting, but I think most people would make the same guess that I am making.)
msg271416 - (view) Author: Doug Hoskisson (Doug Hoskisson) Date: 2016-07-26 21:30
My suggestion for this documentation:

"""
Return the number of items in the queue. Note, in multi-threading this mostly just serves as an approximation, and information from this doesn’t guarantee that a subsequent get() or put() will not block.
"""
msg271459 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-07-27 15:12
"""
Return the number of items in the queue. Note, in multi-threading this mostly just serves as an approximation, and information from this doesn’t guarantee that a subsequent get() or put() will not block.
"""

I dislike this description. If I understand correctly, the issue is that someone must not rely on the size to check if the queue is empty or not. If I'm right, the doc must be more explicit. Something like:

"The size must not be used to check if get() or put() will block. Use get_nowait() and put_nowait(), or get() and put() in non-blocking mode (block=False)."

There is even a Wikipedia article on the bug :-)

https://en.wikipedia.org/wiki/Time_of_check_to_time_of_use

(I'm not sure that it's exactly the same class of bug.)
msg271460 - (view) Author: Doug Hoskisson (Doug Hoskisson) Date: 2016-07-27 16:12
More explicit is ok, if that's what people want, but just not in the first sentence, because that stuff has nothing to do with what is being documented specifically (as evidenced by referencing a wikipedia article that doesn't even mention python).

I don't think more explicit is necessary, but if that's what others want, it's not bad.

How much of the python documentation should be dedicated to teaching people stuff that has nothing to do with python specifically?
msg271467 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2016-07-27 18:41
The current wording is, IMO, better than the suggested wording, especially if you don't want to be "teaching stuff".  The current wording is a specification of the method's behavior.  I really don't know what you could replace "approximate" with that would improve it without having to get into a description of the behavior of a threaded program.

It seems like you are wanting us to document that the function will return an accurate size if the program is single threaded, but I don't think we want to do that, because that is not part of the specification of the method.
msg271468 - (view) Author: Doug Hoskisson (Doug Hoskisson) Date: 2016-07-27 18:55
It is inconsistent with other documentation right next to it.

Should the documentation for empty() say "Return True if the queue is approximately empty, False otherwise."?
Should the documentation for full() say "Return True if the queue is approximately full, False otherwise."?
msg271469 - (view) Author: Doug Hoskisson (Doug Hoskisson) Date: 2016-07-27 18:59
If the specification of the empty method is to return whether the queue is empty, then the programmers have failed to meet that specification, because by the time you get that return value, it might not be empty anymore.
msg271470 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2016-07-27 19:04
The subsequent text specifies the behavior.  You *could* delete the 'approximately' from the qsize documentation to be parallel, but I think that would be a disservice to the reader.  You could also use the phrase "at the moment of the call" in all three, which might be acceptable.  Let's see what Raymond has to say, when he has time to respond.
msg271471 - (view) Author: Doug Hoskisson (Doug Hoskisson) Date: 2016-07-27 19:40
My suggestion was not to delete the "approximate" entirely. Just move it out of the first sentence to make it more consistent with the other documentation.

This is the model I'm seeing in empty() and full():

The first sentence is something simple and direct (without nebulous words that will make people wonder what it means), and then after that, comment on the race-condition stuff.

I think that documentation is good, and it would be good to follow the same model for qsize().
msg271610 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2016-07-29 06:22
Sorry Doug, I don't find any of the suggestions to be an improvement and I concur with David Murray that the docstring for qsize() isn't the place for a tutorial on race conditions and LBYL vs EAPF which are general threading topics rather than a queue specific topics.

Also, I'm reluctant to change Guido's original wording which has served well for a decade.  While I'm sure you can invent ways to misread the word "approximate", it does communicate that this method cannot relied upon to return the exact size.  If we were seeing recurring source of confusion, there might be a basis for a change, but that is not the case.

Sorry, but I'm going to close this one.  Every person might find a different way to wordsmith this one, but I think we should favor Guido's choice.
msg271626 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2016-07-29 14:13
Doug: side note.  Raymond teaches Python, and makes a study of what works and doesn't work in communicating it to students, so he isn't rejecting this lightly.
History
Date User Action Args
2016-07-29 14:13:06r.david.murraysetmessages: + msg271626
2016-07-29 06:22:11rhettingersetstatus: open -> closed
resolution: rejected
messages: + msg271610
2016-07-27 19:54:36rhettingersetassignee: docs@python -> rhettinger
2016-07-27 19:40:41Doug Hoskissonsetmessages: + msg271471
2016-07-27 19:04:53r.david.murraysetmessages: + msg271470
2016-07-27 18:59:45Doug Hoskissonsetmessages: + msg271469
2016-07-27 18:55:24Doug Hoskissonsetmessages: + msg271468
2016-07-27 18:41:04r.david.murraysetmessages: + msg271467
2016-07-27 16:12:13Doug Hoskissonsetmessages: + msg271460
2016-07-27 15:12:26vstinnersetnosy: + vstinner
messages: + msg271459
2016-07-26 21:30:03Doug Hoskissonsetmessages: + msg271416
2016-07-26 21:29:11Doug Hoskissonsetmessages: + msg271415
2016-07-26 19:13:21Doug Hoskissonsetmessages: + msg271409
2016-07-26 18:57:33r.david.murraysetnosy: + rhettinger
messages: + msg271405
2016-07-26 18:51:45Doug Hoskissonsetmessages: + msg271404
2016-07-26 16:01:21r.david.murraysetnosy: + r.david.murray
messages: + msg271386
2016-07-26 13:00:53Doug Hoskissoncreate