classification
Title: regression when passing numpy bools to sorted(..., reverse=r)
Type: Stage:
Components: Library (Lib) Versions: Python 3.9, Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Ryan May, ammar2, mark.dickinson, ralf.gommers, rhettinger, seberg, serhiy.storchaka, tcaswell, terry.reedy, tim.peters, vstinner
Priority: normal Keywords:

Created on 2019-08-29 16:39 by tcaswell, last changed 2019-10-11 13:10 by tcaswell.

Files
File name Uploaded Description Edit
numpy_warning_bisect.log tcaswell, 2019-08-29 16:39 bisect log
python_bisect.sh tcaswell, 2019-08-29 16:40 bisect script
test.py tcaswell, 2019-08-29 16:41 test script for bisecting
Messages (12)
msg350800 - (view) Author: Thomas Caswell (tcaswell) * Date: 2019-08-29 16:39
In python37, numpy1.17 the following runs without warning.

import numpy as np
sorted([1, 2], reverse=np.bool_(True))  

with python38 this emits a

DeprecationWarning: In future, it will be an error for 'np.bool_' scalars to be interpreted as an index

I bisected this back to https://github.com/python/cpython/pull/11952 which looks very intentional, however it is tripping deprecation warnings in numpy to prevent things like `[1, 2, 3][np.bool_(False), np.bool_(True)]` from doing surprising things.

I have previously gotten the sense that core would like to know about these sorts of regressions rather than just handling them down-stream.  I will be opening an issue with numpy as well.
msg350803 - (view) Author: Thomas Caswell (tcaswell) * Date: 2019-08-29 16:48
xref numpy issue https://github.com/numpy/numpy/issues/14397
msg350806 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2019-08-29 17:03
Closely related: #37427. I'm not sure why that issue was closed. My naive expectation would be that in most cases where an API specifies a flag, an arbitrary Python object can be used for that flag (and is then interpreted in boolean context). That's the natural behaviour that you get when handling flags in pure Python code.

Ex: https://github.com/python/cpython/blob/8ba8cc51a9899594f7fe66fe790973d83fd3d657/Lib/pathlib.py#L190
msg350844 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2019-08-30 02:26
PR-11952 was written by Serhiy and reviewed by Victor (and Mark).  They are better qualified to comment on its intended side effects than I am.

In the case of sorting, the exception for, say, "reverse='a'", says the flag must be an 'integer' since at least 2.7, which likely means forever.  If 'not index' implies 'not integer', we should close as 'not a bug' (unless we make a liberal general rule about boolean flag values).

Mark's link is to "def resolve(self, path, strict=False):" where 'strict' is used in 'if strict:'.  If this is 'normal', then perhaps 'reverse' is treated too strictly.

I have not thought about what is normal, at least for builtins, for boolean flags since, like most people, I usually use explicit True or False.  Among builtins:

compile(..., dont_inherit=False,...)  # '' rejected, integer required
print(..., flush=False)  # '' ok
sorted(..., reverse=False)  # integer required
open(..., closefd=True)  # 'a' rejected, integer required
(1).to_bytes(1, 'big', signed='')  # OK
int.from_bytes(b'\xfc\x00', 'big', signed='')  # OK
b'ab c\n\nde fg\rkl\r\n'.splitlines(keepends='a')  # integer required

So far, 4 to 3 in favor of integer required.  I don't see any particular sense to the choices.  An interesting can of worms.
msg350845 - (view) Author: Ammar Askar (ammar2) * (Python triager) Date: 2019-08-30 03:26
Thanks for the insight Terry.

I think the functions that accept ints as bools are kind of a red herring: Booleans were only formally introduced in Python 2.3 [1], thus any functions that existed before that accepted ints and continued to accept ints for backwards compatibility [2]. When transition to argument clinic, these functions use "bool(accept={int})" which restricts the inputs to either bools or ints [3]. 

print, int.to_bytes and int.from_bytes all use argument clinic's bool converter which accepts any truth-y objects. This is similar to PyArg_ParseTuple's 'p' parameter which also accepts any truthy objects as well as say the print function which manually does a PyObject_IsTrue [4].


[1] https://www.python.org/download/releases/2.3/highlights/

[2] list.sort (sorted) existed since the start of Python: https://github.com/python/cpython/blob/85a5fbbdfea617f6cc8fae82c9e8c2b5c424436d/Objects/listobject.c#L423

splitlines exists in Python 2.0: https://github.com/python/cpython/commit/4c08d554b9009899780a5e003d6bbeb5413906ee

[3] https://docs.python.org/3/howto/clinic.html#using-real-argument-clinic-converters-instead-of-legacy-converters

[4] https://github.com/python/cpython/blob/7fcc2088a50a4ecb80e5644cd195bee209c9f979/Python/bltinmodule.c#L1888
msg350857 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2019-08-30 07:41
Yes, I think there's a lot of history here that obscures the picture.

We have mechanisms in Python to allow 3rd party objects to be interpreted as floats (via __float__) or as integers (via __index__). 

So np.int64 (for example) doesn't subclass int (on Python 3), but is usable as an integer almost everywhere that that makes sense because it defines __index__ (and __int__, but it's __index__ that really matters here).

If you're writing something bool-like in a 3rd party library, and you want it to be duck-typeable as a boolean in calls that expect a boolean flag, what special method should you implement? It seems clear to me that the answer should be __bool__, but the current error message appears to be telling the NumPy folks that if they want this to work then np.bool_ should (continue to) implement __index__. That feels odd to me: it's a completely legitimate and sensible choice on NumPy's part to decide that np.bool_ objects *aren't* integers, and shouldn't implement __index__.

The NumPy bug report demonstrates that the concern isn't hypothetical: np.bool_ really *is* a boolean type, and it's easy and natural for a NumPy user to end up passing an np.bool_ instance instead of a bool instance where a flag is expected, and then rather surprising when that doesn't work as expected. I've run into this myself on occasion (turns out that PyQt doesn't like np.bool_ objects either, but that's another story).

The one disadvantage that I can see of allowing arbitrary objects to be used as flags is the potential for surprising failure modes when a function is accidentally misused. For example, it could be surprising if something like:

> "aaaXbbbXccc".splitlines("X")

returned ["aaaXbbbXccc"] instead of raising as it currently does. But many flags are keyword-only, which would make it harder to run into this sort of thing accidentally.
 
In short, I think this particular case _is_ a bug that should be fixed.
msg350858 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2019-08-30 07:43
Since this particular issue is about "sort", I'm adding Tim and Raymond to the nosy list.
msg350860 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-08-30 08:08
See more general issue issue15999.
msg350868 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-08-30 11:02
> DeprecationWarning: In future, it will be an error for 'np.bool_' scalars to be interpreted as an index

That sounds like an issue in numpy: np.bool_ should define the __index__() method, no?

$ python3
Python 3.7.4 (default, Jul  9 2019, 16:32:37) 
>>> import numpy
>>> b=numpy.bool_(True)
>>> b.__int__()
1
>>> b.__index__()
__main__:1: DeprecationWarning: In future, it will be an error for 'np.bool_' scalars to be interpreted as an index
1

The purpose of the DeprecationWarning is to give time to numpy developers to update the bool_() type, before it becomes an error.

I'm not sure of the purpose of this issue, is it just to notice core developers that numpy should be updated? Or do you consider that type.__index__() must continue to fallback on type.__int__() if __index__() is not defined?

In the past, Python was tolerant in term of accepted types. It was hidding some bugs sometimes. We would like to make Python more strict to reduce the risk of bugs. Sadly, it requires to modify your Python code. It's like bytes/str which was strictly separeted in Python 3.0, except that here there is longer and smoother transition period :-)
msg350873 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2019-08-30 12:16
Victor

> The purpose of the DeprecationWarning is to give time to numpy developers to update the bool_() type, before it becomes an error.

No, that's not what's happening: currently, np.bool_ *does* support __index__. In the future, it won't. The NumPy developers are deliberately removing __index__ from np.bool_, for reasons described in the relevant NumPy issues. The DeprecationWarning is coming from NumPy, not Python. See https://github.com/numpy/numpy/pull/9685
msg350877 - (view) Author: Sebastian Berg (seberg) * Date: 2019-08-30 13:12
I applaud the stricter rules in general, as Mark noted nicely, the issue is that `__index__` is maybe a strange way to achieve that for bools (it is not like `123` is a clean bool)? `__nonzero__` coerces to bools, there is no `__bool__` to convert to bool safely.

Basically: this seems to force numpy to back down from saying that `list[np.True_]` will be invalid in the future. (And we cannot just get rid of our bools unfortunately).
msg354449 - (view) Author: Thomas Caswell (tcaswell) * Date: 2019-10-11 13:10
Any update on resolving this?
History
Date User Action Args
2019-10-11 13:10:42tcaswellsetmessages: + msg354449
2019-09-05 19:08:28Ryan Maysetnosy: + Ryan May
2019-09-03 04:52:17ralf.gommerssetnosy: + ralf.gommers
2019-08-30 13:12:17sebergsetmessages: + msg350877
2019-08-30 12:16:11mark.dickinsonsetmessages: + msg350873
2019-08-30 11:02:01vstinnersetmessages: + msg350868
2019-08-30 08:08:15serhiy.storchakasetmessages: + msg350860
2019-08-30 07:43:26mark.dickinsonsetnosy: + tim.peters, rhettinger
messages: + msg350858
2019-08-30 07:41:18mark.dickinsonsetmessages: + msg350857
2019-08-30 03:26:53ammar2setnosy: + ammar2
messages: + msg350845
2019-08-30 02:26:42terry.reedysetnosy: + vstinner, serhiy.storchaka
messages: + msg350844
2019-08-29 20:06:54tcaswellsetcomponents: + Library (Lib)
versions: + Python 3.9
2019-08-29 20:01:11sebergsetnosy: + seberg
2019-08-29 17:03:17mark.dickinsonsetnosy: + terry.reedy, mark.dickinson
messages: + msg350806
2019-08-29 16:48:11tcaswellsetmessages: + msg350803
2019-08-29 16:41:06tcaswellsetfiles: + test.py
2019-08-29 16:40:21tcaswellsetfiles: + python_bisect.sh
2019-08-29 16:39:02tcaswellcreate