Issue 14121: add a convenience C-API function for unpacking iterables

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/58329

classification

Title:	add a convenience C-API function for unpacking iterables
Type:	enhancement	Stage:
Components:	Interpreter Core	Versions:	Python 3.5

process

Status:	closed	Resolution:	rejected
Dependencies:		Superseder:
Assigned To:		Nosy List:	BreamoreBoy, rhettinger, scoder
Priority:	normal	Keywords:

Created on 2012-02-25 09:12 by scoder, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (4)
msg154217 - (view)	Author: Stefan Behnel (scoder) *	Date: 2012-02-25 09:11
In the context of better interfacing of PyPy with Cython, it appears that simple looking things like PyTuple_GET_ITEM() are often rather involved in PyPy's C-API implementation. However, since functions/macros like these are used very frequently, this has an effect on the achievable performance. It occurred to me that there are cases that involve many C-API calls where the intention is simply to unpack a sequence (or iterable) of known length, often just 2 or 3 items. Argument unpacking is one such situation (for which there are appropriate C-API functions), dict item iteration or iteration over enumerate() are other well known cases (at least in Python space). As the one obvious way to handle the general use case, I propose the following addition of a convenience function to the C-API: int PyIter_Unpack(PyObject* iterable, Py_ssize_t min_unpack, Py_ssize_t max_unpack, ...) As indicated by the names, it's meant to unpack any iterable or iterator, really, i.e. it would fall back to iteration if the iterable is neither a tuple nor list, for which special handling code makes the most sense. I thought about naming it PySequence_Unpack(), but that would imply that it should reject unordered (or, for safety, any unknown) iterables and non-sequence iterator as input, which IMHO would complicate matters more than it would help. A warning about unordered iterables in the documentation should be enough. I would expect that most users would actually know the type of sequence that they are processing. The "max_unpack" parameter gives the number of varargs that follows, which are all either of type PyObject** or NULL, the latter indicating that the value is not of interest. Non-NULL pointers will receive a new reference to the item at the corresponding index. The "min_unpack" parameter is made available for error checking. If less items are found in the iterable, the function sets a ValueError and returns -1. Assignments may or may not have taken place at this point, but no owned references are passed back in this case. If, on successful unpacking, the number of unpacked items is smaller than "max_unpack", all remaining item pointers will be set to NULL. Users who do not care about the number of items would pass 0 and those who know the exact length would pass that as both "min_unpack" and "max_unpack". There is one case I'm not sure about yet, and that's how to handle the case of finding more items than "max_unpack" requests. I think it's just as convenient in some cases to automatically raise an exception, as it is in other cases to just ignore them. I think a way to solve this could be to not raise an exception, but to return 0 when all items were processed and 1 when there are remaining items. In this case, users who care could check the result and if they consider left-over items an error, clean up the returned references and raise an error manually. Alternatively, the function could return the number of unpacked items, but that may involve more work on the user side in order to find out what needs to be done. The drawback of a tristate return with and without errors set is that the straight forward "if (PyIter_Unpack(...))" check is no longer enough to correctly detect and propagate errors. Also, when passing an iterator, the function would have to eat one more value in order to determine the return code. That may not be what the caller wants. Maybe an additional flag parameter ("check_size") could solve this. If true, the function will check the size of sequences and report longer sequences as errors, and for iterators, will unpack the next item and report it as error if available. If false, additional values will be ignored for sequences and no attempt will be made for iterators to unpack more items than requested. Because of the questions above, and because this addition involves a certain redundancy with what's there already (namely the argument and tuple unpacking functions which do not work on lists or arbitrary iterables and/or raise the wrong exceptions), I'm asking for comments before writing up a patch. Any thoughts on this?
msg222180 - (view)	Author: Mark Lawrence (BreamoreBoy) *	Date: 2014-07-03 14:01
Apparently not :)
msg222226 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2014-07-03 22:02
Regarding the redundancy, I don't think the C API should be expanded unnecessarily. Also, PyTuple_GET_ITEM() and its kin are very old, widely used, and very fast. I don't see a need to upset that apple cart.
msg222399 - (view)	Author: Stefan Behnel (scoder) *	Date: 2014-07-06 08:55
Ok. This has been idling long enough to just close it.

History
Date	User	Action	Args
2022-04-11 14:57:27	admin	set	github: 58329
2014-07-06 08:55:37	scoder	set	status: open -> closed resolution: rejected messages: + msg222399
2014-07-03 22:02:12	rhettinger	set	nosy: + rhettinger messages: + msg222226
2014-07-03 14:01:35	BreamoreBoy	set	nosy: + BreamoreBoy messages: + msg222180 versions: + Python 3.5, - Python 3.3
2012-02-25 09:12:00	scoder	create