msg144326 - (view) |
Author: Mark Dickinson (mark.dickinson) *  |
Date: 2011-09-20 13:44 |
The docstring of str.splitlines says:
splitlines(...)
S.splitlines([keepends]) -> list of strings
Return a list of the lines in S, breaking at line boundaries.
Line breaks are not included in the resulting list unless keepends
is given and true.
Currently the optional argument can only be specified positionally. What do people think about also allowing this argument to be specified by keyword? (And applying the same change to bytes.splitlines.)
The main motivation here is readability: in
long_string.splitlines(keepends=True)
it's fairly clear what the boolean argument is doing. But in
long_string.splitlines(True)
it's much less clear. (This came up during a Python training class that I was running recently.) I'm not advocating making the argument keyword-only; just allowing those who want to specify it by keyword to do so.
There are many other str methods that don't accept keyword arguments, but it's only this one where I think readability would really benefit.
|
msg144327 - (view) |
Author: Mark Dickinson (mark.dickinson) *  |
Date: 2011-09-20 13:50 |
> (And applying the same change to bytes.splitlines.)
Oh, and bytearray.splitlines, too.
|
msg144328 - (view) |
Author: Alex Gaynor (alex) *  |
Date: 2011-09-20 13:51 |
Personally, I regard every C function which, for obscure internal details, doesn't take keyword arguments as a sad bug, which should of course be fixed :)
|
msg144329 - (view) |
Author: Meador Inge (meador.inge) *  |
Date: 2011-09-20 13:57 |
+1; the keyword arg version is much more readable.
|
msg144343 - (view) |
Author: Mark Dickinson (mark.dickinson) *  |
Date: 2011-09-20 17:56 |
Here's a patch.
|
msg144358 - (view) |
Author: Ezio Melotti (ezio.melotti) *  |
Date: 2011-09-21 00:14 |
LGTM
|
msg144361 - (view) |
Author: Meador Inge (meador.inge) *  |
Date: 2011-09-21 01:32 |
Patch looks good. I noticed a change in the conventional type for
'keepends' from 'int' to 'bool'. Several unit tests were updated to
match this change. Perhaps other call sites should be updated too? A
little greping shows:
$ grep -Rl 'splitlines(0)' * --include='*.py'
Doc/tools/docutils/writers/newlatex2e/__init__.py
$ grep -Rl 'splitlines(1)' * --include='*.py'
Doc/tools/sphinx/pycode/pgen2/tokenize.py
Doc/tools/docutils/readers/python/moduleparser.py
Lib/test/test_tokenize.py
Lib/difflib.py
Lib/lib2to3/pgen2/tokenize.py
Lib/codecs.py
|
msg144470 - (view) |
Author: Terry J. Reedy (terry.reedy) *  |
Date: 2011-09-23 19:22 |
I agree with Alex. The poorly documented fact that *some* C-coded functions cannot accept arguments identified by keyword rather than position is a bit hole in the function abstraction.
+1 to the patch (and the int to bool change)
|
msg144474 - (view) |
Author: Ezio Melotti (ezio.melotti) *  |
Date: 2011-09-23 21:03 |
The attached patch adds 'keepends=' to a few calls and replaces 0/1 with False/True. The patch can be applied after Mark's patch. Doing two separate commits is probably better.
|
msg144482 - (view) |
Author: Meador Inge (meador.inge) *  |
Date: 2011-09-24 02:20 |
> Doing two separate commits is probably better.
Out of curiosity, what is typically the convention on that? The
devguide seems to suggest one changeset per issue:
"""
Just please make sure that you generate a single, condensed patch rather than a series of several changesets.
"""
I think for this case two patches is better. In general, I am OK with
the git-style series and hg-style patchbombs, but the devguide seems
to say otherwise. Hmmm, that makes me wonder if we can patchbomb the
tracker :-)
|
msg144483 - (view) |
Author: Terry J. Reedy (terry.reedy) *  |
Date: 2011-09-24 02:36 |
One changeset per issue is a general goal. So is the ability to review patches. Sometime people forget to add a News or Acks entry and have to followup with an addendum. (Mark's patch still lack that, for instance.) Sometimes a patch is so large that a reviewer asks or requires splitting. In this case, each patch is large enough and touches enough files (up to 10 I think) that two is plausible. What would not be appreciated by either a reviewer or commit-list recipients would be one patch file per patched file. That would be a patch bomb indeed ;-)
|
msg144488 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2011-09-24 08:15 |
New changeset 0b2660384d4e by Mark Dickinson in branch 'default':
Issue #13012: Allow 'keepends' to be passed as a keyword argument in str.splitlines, bytes.splitlines and bytearray.splitlines.
http://hg.python.org/cpython/rev/0b2660384d4e
|
msg144489 - (view) |
Author: Mark Dickinson (mark.dickinson) *  |
Date: 2011-09-24 08:17 |
Thanks for the reviews.
Applied the first patch, plus Misc/NEWS entry, and minus the change to Lib/collections/__init__.py; I think that change belongs more with the second patch.
|
msg144497 - (view) |
Author: Mark Dickinson (mark.dickinson) *  |
Date: 2011-09-24 10:52 |
Ezio's patch looks fine to me, though as Meador points out in the Rietveld review, there are a couple of omissions that could be fixed.
Personally, I'm not so convinced of the value of upgrading all the existing uses of splitlines to use 'keepends'; I just wanted to be able to do this in *new* code. :-) I guess I'm +0 on the int -> bool changes (replacing .splitlines(1) with .splitlines(True)), and -0 on adding the extra 'splitlines' keywords throughout existing source.
|
msg144498 - (view) |
Author: Ezio Melotti (ezio.melotti) *  |
Date: 2011-09-24 13:21 |
> Out of curiosity, what is typically the convention on that?
In theory every issue should be about a single problem and therefore have a single patch to fix it. Sometimes a secondary "problem" that is not strictly related to the first one arises, for example when a cleanup is necessary before applying the patch, or when a new feature is added and a few places get changed to use it. In these cases is not worth opening another issue, so both the patches are attached to the same issue, but committed separately because they address two different "problems".
> Sometime people forget to add a News or Acks entry and have to
> followup with an addendum. (Mark's patch still lack that, for
> instance.)
Avoiding to include the NEWS entry in patches is common, because the file gets updated often and it will most likely cause conflicts. `make patchcheck` remind us to add it -- assuming you don't forget to use it.
> Ezio's patch looks fine to me, though as Meador points out in the
> Rietveld review, there are a couple of omissions that could be fixed.
Adding keepends there would have made the line too long, required some splitting and probably made the code less readable overall, so I prefer to keep it out.
> Personally, I'm not so convinced of the value of upgrading all the
> existing uses of splitlines to use 'keepends'; I just wanted to be
> able to do this in *new* code. :-)
If we can make old code more readable too and we already have a patch to do it, why not? (one instance also had the comment # True == keep line ends)
|
msg144506 - (view) |
Author: Terry J. Reedy (terry.reedy) *  |
Date: 2011-09-24 20:22 |
Given that someone is willing to write a patch and someone else to review it, I am +1 on changing 0/1 to False/True anywhere appropriate in the stdlib. Ditto for using a new feature. The "# True == keep line ends" comment illustrates why the original patch is a good idea.
|
msg144565 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2011-09-28 14:38 |
New changeset 83f43b58c988 by Ezio Melotti in branch 'default':
#13012: use splitlines(keepends=True/False) instead of splitlines(0/1).
http://hg.python.org/cpython/rev/83f43b58c988
|
msg144566 - (view) |
Author: Ezio Melotti (ezio.melotti) *  |
Date: 2011-09-28 14:40 |
I applied my patch, including the changes in Lib/collections/__init__.py, the issue can now be closed.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:57:21 | admin | set | github: 57221 |
2011-09-28 14:40:53 | ezio.melotti | set | status: open -> closed |
2011-09-28 14:40:33 | ezio.melotti | set | assignee: mark.dickinson resolution: fixed messages:
+ msg144566 stage: patch review -> resolved |
2011-09-28 14:38:08 | python-dev | set | messages:
+ msg144565 |
2011-09-24 20:22:34 | terry.reedy | set | messages:
+ msg144506 |
2011-09-24 13:21:46 | ezio.melotti | set | messages:
+ msg144498 |
2011-09-24 10:52:31 | mark.dickinson | set | messages:
+ msg144497 |
2011-09-24 08:17:55 | mark.dickinson | set | messages:
+ msg144489 |
2011-09-24 08:15:44 | python-dev | set | nosy:
+ python-dev messages:
+ msg144488
|
2011-09-24 02:36:44 | terry.reedy | set | messages:
+ msg144483 |
2011-09-24 02:20:58 | meador.inge | set | messages:
+ msg144482 |
2011-09-23 21:04:00 | ezio.melotti | set | files:
+ issue13012-repl.diff
messages:
+ msg144474 |
2011-09-23 19:22:20 | terry.reedy | set | nosy:
+ terry.reedy messages:
+ msg144470
|
2011-09-21 01:32:40 | meador.inge | set | messages:
+ msg144361 |
2011-09-21 00:14:58 | ezio.melotti | set | messages:
+ msg144358 |
2011-09-20 17:56:59 | mark.dickinson | set | stage: needs patch -> patch review |
2011-09-20 17:56:45 | mark.dickinson | set | files:
+ issue13012.patch keywords:
+ patch messages:
+ msg144343
|
2011-09-20 13:58:21 | ezio.melotti | set | nosy:
+ ezio.melotti
|
2011-09-20 13:57:03 | meador.inge | set | nosy:
+ meador.inge messages:
+ msg144329
|
2011-09-20 13:51:46 | alex | set | nosy:
+ alex messages:
+ msg144328
|
2011-09-20 13:50:40 | mark.dickinson | set | messages:
+ msg144327 |
2011-09-20 13:44:06 | mark.dickinson | create | |