classification
Title: Allow keyword argument in str.splitlines()
Type: enhancement Stage: resolved
Components: Interpreter Core Versions: Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: mark.dickinson Nosy List: alex, ezio.melotti, mark.dickinson, meador.inge, python-dev, terry.reedy
Priority: normal Keywords: patch

Created on 2011-09-20 13:44 by mark.dickinson, last changed 2011-09-28 14:40 by ezio.melotti. This issue is now closed.

Files
File name Uploaded Description Edit
issue13012.patch mark.dickinson, 2011-09-20 17:56 review
issue13012-repl.diff ezio.melotti, 2011-09-23 21:03 review
Messages (18)
msg144326 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2011-09-20 13:44
The docstring of str.splitlines says:

splitlines(...)
    S.splitlines([keepends]) -> list of strings
    
    Return a list of the lines in S, breaking at line boundaries.
    Line breaks are not included in the resulting list unless keepends
    is given and true.

Currently the optional argument can only be specified positionally.  What do people think about also allowing this argument to be specified by keyword?  (And applying the same change to bytes.splitlines.)

The main motivation here is readability:  in

   long_string.splitlines(keepends=True)

it's fairly clear what the boolean argument is doing.  But in

   long_string.splitlines(True)

it's much less clear.  (This came up during a Python training class that I was running recently.)  I'm not advocating making the argument keyword-only;  just allowing those who want to specify it by keyword to do so.

There are many other str methods that don't accept keyword arguments, but it's only this one where I think readability would really benefit.
msg144327 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2011-09-20 13:50
> (And applying the same change to bytes.splitlines.)

Oh, and bytearray.splitlines, too.
msg144328 - (view) Author: Alex Gaynor (alex) * (Python committer) Date: 2011-09-20 13:51
Personally, I regard every C function which, for obscure internal details, doesn't take keyword arguments as a sad bug, which should of course be fixed :)
msg144329 - (view) Author: Meador Inge (meador.inge) * (Python committer) Date: 2011-09-20 13:57
+1; the keyword arg version is much more readable.
msg144343 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2011-09-20 17:56
Here's a patch.
msg144358 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2011-09-21 00:14
LGTM
msg144361 - (view) Author: Meador Inge (meador.inge) * (Python committer) Date: 2011-09-21 01:32
Patch looks good.  I noticed a change in the conventional type for 
'keepends' from 'int' to 'bool'.  Several unit tests were updated to 
match this change.  Perhaps other call sites should be updated too?  A 
little greping shows:

$ grep -Rl 'splitlines(0)' * --include='*.py'
Doc/tools/docutils/writers/newlatex2e/__init__.py
$ grep -Rl 'splitlines(1)' * --include='*.py'
Doc/tools/sphinx/pycode/pgen2/tokenize.py
Doc/tools/docutils/readers/python/moduleparser.py
Lib/test/test_tokenize.py
Lib/difflib.py
Lib/lib2to3/pgen2/tokenize.py
Lib/codecs.py
msg144470 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2011-09-23 19:22
I agree with Alex. The poorly documented fact that *some* C-coded functions cannot accept arguments identified by keyword rather than position is a bit hole in the function abstraction.
+1 to the patch (and the int to bool change)
msg144474 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2011-09-23 21:03
The attached patch adds 'keepends=' to a few calls and replaces 0/1 with False/True.  The patch can be applied after Mark's patch.  Doing two separate commits is probably better.
msg144482 - (view) Author: Meador Inge (meador.inge) * (Python committer) Date: 2011-09-24 02:20
> Doing two separate commits is probably better.

Out of curiosity, what is typically the convention on that?  The 
devguide seems to suggest one changeset per issue:

"""
Just please make sure that you generate a single, condensed patch rather than a series of several changesets.
"""

I think for this case two patches is better.  In general, I am OK with
the git-style series and hg-style patchbombs, but the devguide seems
to say otherwise.  Hmmm, that makes me wonder if we can patchbomb the
tracker :-)
msg144483 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2011-09-24 02:36
One changeset per issue is a general goal. So is the ability to review patches. Sometime people forget to add a News or Acks entry and have to followup with an addendum. (Mark's patch still lack that, for instance.) Sometimes a patch is so large that a reviewer asks or requires splitting. In this case, each patch is large enough and touches enough files (up to 10 I think) that two is plausible. What would not be appreciated by either a reviewer or commit-list recipients would be one patch file per patched file. That would be a patch bomb indeed ;-)
msg144488 - (view) Author: Roundup Robot (python-dev) Date: 2011-09-24 08:15
New changeset 0b2660384d4e by Mark Dickinson in branch 'default':
Issue #13012: Allow 'keepends' to be passed as a keyword argument in str.splitlines, bytes.splitlines and bytearray.splitlines.
http://hg.python.org/cpython/rev/0b2660384d4e
msg144489 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2011-09-24 08:17
Thanks for the reviews.

Applied the first patch, plus Misc/NEWS entry, and minus the change to Lib/collections/__init__.py;  I think that change belongs more with the second patch.
msg144497 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2011-09-24 10:52
Ezio's patch looks fine to me, though as Meador points out in the Rietveld review, there are a couple of omissions that could be fixed.

Personally, I'm not so convinced of the value of upgrading all the existing uses of splitlines to use 'keepends';  I just wanted to be able to do this in *new* code. :-)  I guess I'm +0 on the int -> bool changes (replacing .splitlines(1) with .splitlines(True)), and -0 on adding the extra 'splitlines' keywords throughout existing source.
msg144498 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2011-09-24 13:21
> Out of curiosity, what is typically the convention on that?

In theory every issue should be about a single problem and therefore have a single patch to fix it.  Sometimes a secondary "problem" that is not strictly related to the first one arises, for example when a cleanup is necessary before applying the patch, or when a new feature is added and a few places get changed to use it.  In these cases is not worth opening another issue, so both the patches are attached to the same issue, but committed separately because they address two different "problems".

> Sometime people forget to add a News or Acks entry and have to
> followup with an addendum. (Mark's patch still lack that, for 
> instance.)

Avoiding to include the NEWS entry in patches is common, because the file gets updated often and it will most likely cause conflicts.  `make patchcheck` remind us to add it -- assuming you don't forget to use it.

> Ezio's patch looks fine to me, though as Meador points out in the 
> Rietveld review, there are a couple of omissions that could be fixed.

Adding keepends there would have made the line too long, required some splitting and probably made the code less readable overall, so I prefer to keep it out.

> Personally, I'm not so convinced of the value of upgrading all the
> existing uses of splitlines to use 'keepends'; I just wanted to be
> able to do this in *new* code. :-)

If we can make old code more readable too and we already have a patch to do it, why not?  (one instance also had the comment # True == keep line ends)
msg144506 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2011-09-24 20:22
Given that someone is willing to write a patch and someone else to review it, I am +1 on changing 0/1 to False/True anywhere appropriate in the stdlib. Ditto for using a new feature. The "# True == keep line ends" comment illustrates why the original patch is a good idea.
msg144565 - (view) Author: Roundup Robot (python-dev) Date: 2011-09-28 14:38
New changeset 83f43b58c988 by Ezio Melotti in branch 'default':
#13012: use splitlines(keepends=True/False) instead of splitlines(0/1).
http://hg.python.org/cpython/rev/83f43b58c988
msg144566 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2011-09-28 14:40
I applied my patch, including the changes in Lib/collections/__init__.py, the issue can now be closed.
History
Date User Action Args
2011-09-28 14:40:53ezio.melottisetstatus: open -> closed
2011-09-28 14:40:33ezio.melottisetassignee: mark.dickinson
resolution: fixed
messages: + msg144566
stage: patch review -> resolved
2011-09-28 14:38:08python-devsetmessages: + msg144565
2011-09-24 20:22:34terry.reedysetmessages: + msg144506
2011-09-24 13:21:46ezio.melottisetmessages: + msg144498
2011-09-24 10:52:31mark.dickinsonsetmessages: + msg144497
2011-09-24 08:17:55mark.dickinsonsetmessages: + msg144489
2011-09-24 08:15:44python-devsetnosy: + python-dev
messages: + msg144488
2011-09-24 02:36:44terry.reedysetmessages: + msg144483
2011-09-24 02:20:58meador.ingesetmessages: + msg144482
2011-09-23 21:04:00ezio.melottisetfiles: + issue13012-repl.diff

messages: + msg144474
2011-09-23 19:22:20terry.reedysetnosy: + terry.reedy
messages: + msg144470
2011-09-21 01:32:40meador.ingesetmessages: + msg144361
2011-09-21 00:14:58ezio.melottisetmessages: + msg144358
2011-09-20 17:56:59mark.dickinsonsetstage: needs patch -> patch review
2011-09-20 17:56:45mark.dickinsonsetfiles: + issue13012.patch
keywords: + patch
messages: + msg144343
2011-09-20 13:58:21ezio.melottisetnosy: + ezio.melotti
2011-09-20 13:57:03meador.ingesetnosy: + meador.inge
messages: + msg144329
2011-09-20 13:51:46alexsetnosy: + alex
messages: + msg144328
2011-09-20 13:50:40mark.dickinsonsetmessages: + msg144327
2011-09-20 13:44:06mark.dickinsoncreate