This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Refactor simple iterators implementation
Type: enhancement Stage: resolved
Components: Interpreter Core Versions: Python 3.6
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: rhettinger, serhiy.storchaka, tim.peters
Priority: normal Keywords: patch

Created on 2016-07-02 06:40 by serhiy.storchaka, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
iterlib.patch serhiy.storchaka, 2016-07-02 06:40 review
Messages (3)
msg269703 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-07-02 06:40
The implementation of iterators for str, bytes, bytearray, tuple and list is virtually the same. Proposed patch moves common code to the template file iterlib.h and parametrizes it with few macros.

The patch decreases the number of source lines by 600.

 Makefile.pre.in                    |    2 
 Objects/bytearrayobject.c          |  165 ------------
 Objects/bytesobject.c              |  167 -------------
 Objects/iterlib.h                  |  168 +++++++++++++
 Objects/listobject.c               |  468 ++++++++++--------------------------
 Objects/tupleobject.c              |  156 ------------
 Objects/unicodeobject.c            |  171 -------------
 PCbuild/pythoncore.vcxproj         |    1 
 PCbuild/pythoncore.vcxproj.filters |    3 
 9 files changed, 349 insertions(+), 942 deletions(-), 10 modifications(!)
msg269704 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2016-07-02 09:58
Though this gives us a nicely reduced line count, I'm disinclined to do this for several reasons:

The benefit is too small to warrant this kind of atypical macro/include magic (with "magic" being the pattern of making a series of #defines followed by an #inclusion of a .h file that uses those definition to generate code that would normally be in a .c file and then #undefining so that the pattern can be repeated elsewhere).

Anyone trying to maintain this code will have to rediscover what you're doing and mentally unroll all the steps.  I also think it will confuse static tools like CTAGs, making the source harder to trace.

In general, the rule of refactoring is to express every idea only once in the source, so I do understand the motivation for this patch; however, this patch results in a net increase in maintenance burden and will make it more difficult for our newer core developers to understand the code.  Currently listobject.c and tupleobject.c are separately digestible and intelligible -- one can read and make changes to each without learning new macros and without worrying about the effects of tight coupling ("don't tug on that, you never know what it might be attached to").

I don't think we're solving a real problem here.  The code currently works fine and is easy to digest one object type at a time.  

For me, the patch interferes with the way I think about and work with these modules.  I reason about the what the iterator is doing in conjuction with the surrounding code for the rest of the type.  This patch has a conflicting notion which is that sequence iteration is a single concept that should be reasoned about independently of how the type is implemented.  

If someone later wants to do major surgery to one of the types (such as being contemplated now with compacts dicts being applied to dictobject.c), the maintainer would have to first undo this patch.

FWIW, I looked at similar transformations many times and have always concluded that it would make us worse off even though the code size would be smaller.  For example, the itertools module has internal redundancy; however, that is a net benefit because it allows each tool to be reasoned about separately and to be maintained without worrying about the effects on the other tools.

So, please put me down for a -1.  Factoring and less code are generally good, but tricky macro/define/include/undefine magic is bad.  Tight coupling is also bad.

I worry that in you efforts to "refactor" just about everything in the entire code base, you are invalidating the design decisions of every developer who came before you.  In creating many new macros, we're increasing the learning curve for subsequent maintainers (they have to first learn what tricks the macros are doing and what is behind the abstraction before being able to do anything).  Also, it creates was somewhat constipated coding style where everything has been squeezed tightly together and lost the simple fluidity and lucidity that it had before.
msg269730 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-07-02 20:00
Thank you for your detailed explanation Raymond.

In any case, the work on this patch helped to find a bug in bytearray iterator (issue27443).
History
Date User Action Args
2022-04-11 14:58:33adminsetgithub: 71625
2016-07-02 20:00:24serhiy.storchakasetstatus: open -> closed
resolution: rejected
messages: + msg269730

stage: patch review -> resolved
2016-07-02 09:58:17rhettingersetnosy: + tim.peters
2016-07-02 09:58:09rhettingersetnosy: + rhettinger
messages: + msg269704
2016-07-02 06:41:11serhiy.storchakasettitle: Refactot simple iterators implementation -> Refactor simple iterators implementation
2016-07-02 06:40:08serhiy.storchakacreate