classification
Title: add "start" arg to max and min functions
Type: enhancement Stage:
Components: Library (Lib) Versions: Python 3.2, Python 2.7
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: esam, phr, r.david.murray, rhettinger
Priority: normal Keywords:

Created on 2009-10-16 19:24 by phr, last changed 2009-10-18 19:26 by rhettinger. This issue is now closed.

Messages (7)
msg94145 - (view) Author: paul rubin (phr) Date: 2009-10-16 19:24
Lots of times I want to find the largest element of a list or sequence,
defaulting to 0 if the list or sequence is empty.  max(seq) throws an
exception if seq is empty, so I end up using reduce(max, seq, 0).  That
is a standard functional programming idiom but can be a bit confusing to
imperative-style Python programmers.  

max with multiple args is already overloaded to mean the maximum of the
args, so I think it would be a good fix to add a keyword arg to accept
an optional initial value:  max(seq, start=0).  For symmetry, min should
accept the same arg.

The alternatives to using reduce aren't so attractive.  If seq happens
to be a list, there might be a temptation to conditionalize on
len(seq)==0, but that is poor style since it will break if seq later
changes to an arbitrary sequence.  And trying to test it by calling
.next() and saving the value and/or trapping StopIteration gets awfully
messy.
msg94147 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2009-10-16 20:53
In case you don't remember it, this thread from python-ideas is relevant:

http://mail.python.org/pipermail/python-ideas/2009-April/004107.html

I can't tell from rereading the thread whether the support was for the
'initial' version, or the 'default' version that was more-or-less
rejected as being too confusing.  If the support was for the initial
version then that would be support for this proposal.
msg94151 - (view) Author: paul rubin (phr) Date: 2009-10-16 21:44
David, I'm not on that mailing list so hadn't seen the earlier
discussion.  I sympathasize with Raymond's YAGNI argument because I'm
comfortable with reduce(max,seq,0); but then I remember there was once a
movement to remove the "reduce" function from builtins, which would have
broken that idiom.  I also understand that not everyone is comfortable
with that style.  I recently had to hand over some code to another
programmer where I had used that idiom, and in the course of adding
comments to the code in preparation for the handover, I found myself
writing quite a few words about why I'd used "reduce" that way, so I
figured that "explicit is better than implicit" suggests adding default
or initial args to the max function, just like "reduce" already has (I
figure that max on a sequence is a special case of reduce).

My proposed python implementation:

def mymax(*args, **kwargs):
  if len(args) > 1: return max(*args)
  if len(args) == 0: raise TypeError, "mymax needs at least one
positional arg"
  if 'initial' in kwargs: return reduce(max,args[0],kwargs['initial'])
  return reduce(max,args[0])
msg94164 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2009-10-17 05:01
Sorry, that "in case you don't remember' was directed at Raymond, not you.

Since Raymond took ownership of this issue I don't think he's dismissing
it (at least not yet :)  I think his YAGNI was for the 'default'
version, which is not what you are proposing.

Note that 'reduce' is _not_ a builtin in Python3, although it is still
available and so the idiom is not broken, just slightly wordier.
msg94193 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2009-10-17 23:43
Here’s a summary of my research so far (including discussion with other
programmers, a Google code search, discussion on #python-dev on IRC, and
comparing the proposal to other APIs with start-arguments such as sum(),
reduce(), and enumerate()):

1. Showed several examples to other programmers and found that they did
not immediately get what the start argument was trying to do.  Was a
start argument the same as:

 reduce(max, seq, 0)     # zero when empty and never less than zero
 max(seq) if seq else 0  # zero when empty (only works for sequences)
 max(chain([0], seq)     # zero when empty and never less than zero

2. There is an issue of API complexity.  Even if a feature is useful and
clear, it may not be a good idea when the API of a function is already
complex.  In the case of min()/max(), we already have special handling
for one argument (treated as an iterator) versus no arguments (treated
an error versus multiple arguments (treated as an input sequence of
values).  We also have a key= argument.  Taken together, the min/max
functions already have a lot of features.

3.Beyond the complexity of having too many features in a function that
should be simple, there is also an issue of how those features would
interact:

 min(iterable, key=f, start=x)  # is the default value x or f(x)?
 min(start=x)          # should this be allowed?
 min(*args, start=x)   # if so, what is behavior when len(args)==0 or 1
or 2?

4. The argument about reduce(max, seq, 0) being confusing to
non-functional programmers isn’t persuasive since perfectly clear
(though multi-line) exception-catching or size-checking imperative forms
can be written, or one can can simply factor-out any recurring
expressions if they seem awkward or confusing:

    def max_or_zero(iterable):
        'Return zero if the iterable is empty or max(0, max(iterable))
otherwise'
        return functools.reduce(max, iterable, 0)

5. In the case of sequences, it can be clearer to write:

 max(seq) if seq else 0  
 max(seq + [0])     # or use itertools.chain()

6. A code search showed that max() is mostly used in a two-argument
form.  When it does get used with iterables, it doesn't seem common to
trap ValueErrors.
msg94196 - (view) Author: paul rubin (phr) Date: 2009-10-18 02:01
1. Yes, the idea is to figure out the best solution and go with it (or
decide to do nothing).  That many possibilities exist just points to the
need for experience and wisdom in identifying the best choice ("one and
preferably only one").  One of the attractive points about Python is it
comes with so many such decisions already wisely made.  The design
wisdom embodied in Python is almost like a useful software library in
its own right.  So the presence of multiple choices should be seen as an
invitation to resolve the issue, not a flag to keep it unresolved.

2. I agree that the multi-arg and iterator API's should have been done
as separate functions (or denoted through a keyword arg), but what we
have isn't too bad, and it's what we have.

3.  min(iterable, key=f, start=x) should obviously default to the same
thing as min([x], key=f).  min(*args, start=x) should only be allowed
when len(args)==1, since the start arg is restricted to the case of
minimum over an iterator.  min(start=x) should not be allowed.

4. I'm in general unpersuaded by the argument that a stdlib function
isn't worth having because the same thing can be done by bloating up the
user code.  Python code should be concise, which means using the stdlib
in preference to writing more user-defined functions that the next
maintainer has to figure out, and which may have their own bugs and
unhandled edge cases.  For stdlib design, it's mostly a matter of
deciding whether a construction occurs often enough to write a re-usable
function for.  In this case it wouldn't have occurred to me to write any
of those suggested versions, instead of writing reduce(max, iterator, 0)
with an explanatory comment.  But even the observation that "reduce"
made me bloat up the comments in the code, rather than the code itself,
was enough to get me to suggest adding the initial arg.

5. I don't think it's good to rely on bool(seq) or len(seq) to be
available if there's a simple construction (like here) that works for
arbitrary iterables.  It's better to handle the general case.  I hadn't
even realized til just now that "sequence" and "iterable" didn't mean
the same thing.

6. Yes, I know it's not common to trap ValueErrors when using max with
iterables.  I wrote code without such traps myself and then got bitten
by unhandled exceptions when some of those iterables turned out to be
empty (hence my "reduce" hack).  It wouldn't surprise me if lots more
such code out there is similarly buggy.  I think it's good to make
bug-avoiding mechanisms obvious and convenient.
msg94220 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2009-10-18 19:26
Thanks for the thoughtful reply.  To give the idea another chance,
today, I showed a code example to another experienced programmer.

"What does this function return?
   max([-2, 5], start=0)
"

The person had a hard time making any sense of it.  Perhaps the start
argument was an index, or starting position, or a ceiling, or a floor.

I'm rejecting the feature request for several reasons:

* the code sample doesn't have obvious meaning to experienced programmers

* the API of the min/max functions is already too complex -- the
language needs to be easy to learn and remember

* the start keyword argument doesn't interact well with the other
features (such as the key argument or positional arguments).  Mark
questioned whether the key function would apply to the start value (any
answer is arbitrary because either could work).  Also, there was a
question of how it would work with *args (does it change the case with
zero arguments as it does for the iterator case).  When the choice of
semantics are arbitrary, it is better not to guess and instead have the
coder explicitly say what is supposed to happen.

* it isn't really needed, there are several existing ways to create the
same effect. 

* the meaning of "start" is ambiguous and arbitrary (is it a default
value for an empty input or it is like adding an extra value to the
input sequence).  We could pick one of the two meanings but that doesn't
help a coder or code reviewer remember which meaning was correct.  To
prevent bugs, it is better for the code to explicitly spell-out how the
corner case is to be handled.

* in mathematical notation, I see the corner cases being handled by
piecewise functions (if seq is empty, value is this, else compute
min/max) instead of the notation trying to override the simple meaning
of min/max.

* I haven't found precendents in any other language.  Even if there
were, we would still have the problem of too many features being loaded
onto Python's min/max and the unclear semantics of how those features
would interact.

Thank you for the feature request.  Sorry, this one didn't pan out.
History
Date User Action Args
2009-10-18 19:26:09rhettingersetstatus: open -> closed
resolution: rejected
messages: + msg94220
2009-10-18 02:01:54phrsetmessages: + msg94196
2009-10-17 23:43:15rhettingersetmessages: + msg94193
2009-10-17 05:01:35r.david.murraysetmessages: + msg94164
2009-10-17 04:46:03esamsetnosy: + esam
2009-10-16 21:44:11phrsetmessages: + msg94151
2009-10-16 20:53:33r.david.murraysetpriority: normal
nosy: + r.david.murray
messages: + msg94147

2009-10-16 20:13:39rhettingersetassignee: rhettinger

nosy: + rhettinger
versions: + Python 2.7, Python 3.2, - Python 2.5
2009-10-16 19:34:03phrsettype: enhancement
2009-10-16 19:24:02phrcreate