classification
Title: reindent.py should not convert newlines
Type: behavior Stage: commit review
Components: Demos and Tools Versions: Python 3.2, Python 3.3, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: jason.coombs Nosy List: eli.bendersky, eric.araujo, jason.coombs, orsenthil, python-dev, r.david.murray, scott.dial
Priority: high Keywords: patch

Created on 2010-12-06 16:36 by jason.coombs, last changed 2011-08-02 12:20 by python-dev. This issue is now closed.

Files
File name Uploaded Description Edit
reindent-autonewline.patch jason.coombs, 2010-12-06 16:42 review
900df5732f93.diff ezio.melotti, 2011-07-26 12:40 review
Repositories containing patches
https://bitbucket.org/jaraco/cpython-issue10639
Messages (34)
msg123476 - (view) Author: Jason R. Coombs (jason.coombs) * (Python committer) Date: 2010-12-06 16:36
When reindent.py runs, it will convert the line endings for each file it converts to the default line ending for the platform on which reindent.py runs. It would be better if reindent.py would retain line endings of the source file. Attached is a patch that addresses this issue.
msg123477 - (view) Author: Jason R. Coombs (jason.coombs) * (Python committer) Date: 2010-12-06 16:42
Merged the patch with the latest trunk.
msg123478 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2010-12-06 16:51
Looks good.
msg124369 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2010-12-19 23:43
If nobody objects, I will commit this when py3k is unfrozen.
msg125818 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-01-09 01:11
I did a bit of testing on demo files.  The first one I tested with had mixed EOLs, in which case the newlines attribute will be a tuple and open(..., newline=r.newlines) will fail.  For reference: http://docs.python.org/dev/library/io#io.TextIOBase.newlines

What would be the best behavior in such cases?  Refuse the temptation to guess and exit?
msg126915 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2011-01-24 03:21
Why is reindent.py translating the whole file to platform default newline such a bad thing? Could not it be considered as helpful behavior, especially in the case where the user has mixed EOLs by mistake?
msg127449 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-01-29 17:38
I’m not opposed to reindent normalizing EOLs, given that our VCS hooks will translate them to the right thing (for example, making sure that files specific to the Windows build use CRLF, not LF, even if edited on POSIX).
msg127481 - (view) Author: Jason R. Coombs (jason.coombs) * (Python committer) Date: 2011-01-29 20:04
That's great that reindent works in your environment, but that's no help in my environment.

Here's the use case: We have a software package developed in Unix and with Unix LF line endings. The code base may also contain files with CRLF line endings (I can't be certain).

I'm developing in Windows (which includes reindent.py installed by default). I'd like to use that tool to reindent the code base without munging the line endings. I can't do this without first modifying reindent.py or using another tool to convert the line endings afterword (at which point I don't know if the Windows line endings were created by reindent.py or were there natively).

If someone wants the line endings normalized on their code base, they should use another tool (or possibly an option to reindent.py), but there's definitely a case for using reindent.py without altering the line endings.
msg127482 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-01-29 20:09
That is a valid use case, so +1 on not changing behavior and adding a new option in 3.3 (in another bug report).

What is your reply to msg125818?
msg136030 - (view) Author: Jason R. Coombs (jason.coombs) * (Python committer) Date: 2011-05-15 14:26
Re msg125818, I agree that the mixed EOLs is also a potential problem, though the proposed solution behaves better than the status quo, where the EOLs get converted without warning.

It would be desirable for the patch to be more robust in that situation (providing a useful error message about why it fails). What might even be better is to retain newlines even if they're mixed. I'll explore a cross-platform technique to consistently retain newlines as encountered.
msg141092 - (view) Author: Jason R. Coombs (jason.coombs) * (Python committer) Date: 2011-07-25 13:56
I looked into the possibility of retaining newline characters even for files with mixed newlines, but I've decided that's too intrusive an approach. The current implementation specifically takes measures to strip whitespaces from the ends of lines, so it seems outside the scope of this bug to alter that behavior.

So I've taken another stab at a more robust implementation that does newline detection but raises an error if the file contains mixed newlines.

Furthermore, it adds an option (--newline) to specify which newline character to use, bypassing the mixed-newline error and allowing the user to override newline detection.

Here's a demo run:

    PS C:\cpython-issue10639> python .\Tools\scripts\reindent.py .\foo.py
    .\foo.py: mixed newlines detected; cannot continue without --newline
    PS C:\cpython-issue10639> python .\Tools\scripts\reindent.py --newline CRLF .\foo.py
    PS C:\cpython-issue10639> python .\Tools\scripts\reindent.py --newline LF .\foo.py
    PS C:\cpython-issue10639> python .\Tools\scripts\reindent.py .\foo.py

I've published this change as https://bitbucket.org/jaraco/cpython-issue10639/changeset/900df5732f93.

Please review. If this changeset is acceptable, I will push the revisions to the master repo. Please advise if I may also backport to Python 3.2 and 2.7.
msg141151 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-07-26 12:33
Thanks for your work.  The generated patch is for some reason unreadable, so I looked at the changeset on Bitbucket.  It looks good, except that a new option would have to be 3.3-only, whereas the bug could be fixed in 2.7 and 3.2 too.  If it’s too bothersome for you to split your current patch, then we can fix in 3.3 only.
msg141153 - (view) Author: Jason R. Coombs (jason.coombs) * (Python committer) Date: 2011-07-26 12:41
No problem. I'll rebase and push the current patch as-is, and then backport to 3.2 and 2.7 without the option (just raising the error when mixed newlines are encountered).
msg141154 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-07-26 12:46
> If this changeset is acceptable, I will push the revisions to the
> master repo.

> I'll rebase and push the current patch as-is,

What repo are you talking about, BTW?  The developers list in the devguide does not contain your name, so I assume you’re not talking about hg.python.org/cpython?

> and then backport to 3.2 and 2.7 without the option
IMO it would be easier to make one bugfix changeset in 3.2, merge it in 3.3, and then add the new option.
msg141159 - (view) Author: Jason R. Coombs (jason.coombs) * (Python committer) Date: 2011-07-26 14:16
> I assume you’re not talking about hg.python.org/cpython?

My name appears as jason.coombs. But your note does remind me that I need to review the devguide, since it's been a while since I've pushed something. I may also look into linking/merging my two accounts.

I plan to do as you suggest, splitting the patch into the bugfix and the new capability.
msg141160 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-07-26 14:24
Okay, I hadn’t seen you in http://docs.python.org/devguide/developers and I don’t recall the URI of the generated file with all names.  I’ve asked that your Roundup profile be updated so that you get the Python icon and the possibility to be assigned bugs.

If you already have clones, the workflow should not be hard: go to a 3.2 checkout, fix things, run patchcheck and tests, commit.  Go to 3.3, pull, merge, test, commit, push.  The only tricky things that can make push hooks reject changesets are trailing whitespace, maybe EOLs, and new named branches.  We also prefer to push one changeset with all changes, not a series of changesets with gradual fixes.
msg141163 - (view) Author: Roundup Robot (python-dev) Date: 2011-07-26 15:59
New changeset 070dc6e359fb by Jason R. Coombs in branch '3.2':
Fixes #10639: reindent.py should not convert newlines
http://hg.python.org/cpython/rev/070dc6e359fb

New changeset 826a0208d143 by Jason R. Coombs in branch 'default':
Merge with 3.2 Issue #10639: reindent.py should not convert newlines.
http://hg.python.org/cpython/rev/826a0208d143

New changeset 4feb889d3bed by Jason R. Coombs in branch 'default':
Issue #10639: reindent.py tool now accepts a --newline option to specify the newline to be used in the output of converted files.
http://hg.python.org/cpython/rev/4feb889d3bed
msg141164 - (view) Author: Jason R. Coombs (jason.coombs) * (Python committer) Date: 2011-07-26 16:01
I've stripped the undesirable revisions and updated the bitbucket repo so it now contains three changesets for Python 3.2 and 3.3 and suggested.

I don't believe running the test suite is relevant, as I grepped the test suite for reindent, and there's no reference to the tool script.

It took several rounds of rebasing and stripping and importing patches to get everything right, but I believe these changes meet the desired goals.
msg141165 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-07-26 16:09
> I've stripped the undesirable revisions and updated the bitbucket repo
> so it now contains three changesets for Python 3.2 and 3.3 and
> suggested.

hg diff combined with hg import can help you flatten a series of changeset into one.

> I don't believe running the test suite is relevant
You’re right, I mentioned that step because I was talking about the workflow in general, but here it does not apply, we only have manual testing for most tools.

> I believe these changes meet the desired goals.
Me too!  Are you comfortable with how to backport this to 2.7?
msg141171 - (view) Author: Jason R. Coombs (jason.coombs) * (Python committer) Date: 2011-07-26 16:59
What do you think about this: https://bitbucket.org/jaraco/cpython-issue10639/changeset/ea63310dddce

The patch is a little more intrusive than the Python 3 patch because Python 2.7 doesn't allow specifying the newline to use when writing a file (afaict), but it abstracts that intrusiveness in the NewlineAdapter class.

I've tested the script on a file with mixed newlines, a file with non-native newlines, and a file with content but no newlines, and it performs as expected.
msg141233 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-07-27 15:24
> The patch is a little more intrusive than the Python 3 patch because
> Python 2.7 doesn't allow specifying the newline to use when writing a
> file (afaict)

Instead of writing a new class, what about using io.open instead of the builtin?  Then you’ll be able to use the same patch than 3.2.
msg141239 - (view) Author: Jason R. Coombs (jason.coombs) * (Python committer) Date: 2011-07-27 16:16
> Instead of writing a new class, what about using io.open instead of the
> builtin?  Then you’ll be able to use the same patch than 3.2.

Ooh. Excellent suggestion.
msg141261 - (view) Author: Roundup Robot (python-dev) Date: 2011-07-27 18:41
New changeset f9cf55bbe9b9 by Jason R. Coombs in branch '2.7':
Fixes #10639: reindent.py should not convert newlines
http://hg.python.org/cpython/rev/f9cf55bbe9b9
msg141325 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2011-07-29 03:55
It appears this breaks "make patchcheck" in trunk:

./python ./Tools/scripts/patchcheck.py
Getting the list of files that have been added/changed ... 5 files
Fixing whitespace ... Traceback (most recent call last):
  File "./Tools/scripts/patchcheck.py", line 147, in <module>
    main()
  File "./Tools/scripts/patchcheck.py", line 129, in main
    normalize_whitespace(python_files)
  File "./Tools/scripts/patchcheck.py", line 22, in call_fxn
    result = fxn(*args, **kwargs)
  File "./Tools/scripts/patchcheck.py", line 66, in normalize_whitespace
    fixed = [path for path in file_paths if path.endswith('.py') and
  File "./Tools/scripts/patchcheck.py", line 67, in <listcomp>
    reindent.check(path)]
  File "/home/eliben/python-src/33/Tools/scripts/reindent.py", line 129, in check
    newline = spec_newline if spec_newline else r.newlines
NameError: global name 'spec_newline' is not defined
make: *** [patchcheck] Error 1
msg141328 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2011-07-29 04:34
This is a good example of why passing parameters into functions by means of globals sucks. In reindent.py, main() sets the spec_newline global which check() uses, but this was forgotten in patchcheck.py which also uses reindent.check()

IMHO reindent.py should be rewritten to ditch all globals
msg141361 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-07-29 12:42
I agree that using globals instead function arguments sucks, but let’s fix this bug without rewriting the whole file (unless this affects only the code new in 3.3).
msg141370 - (view) Author: Jason R. Coombs (jason.coombs) * (Python committer) Date: 2011-07-29 13:16
The 'spec_newline' in particular is only in the trunk (not in the backports), as it's part of the new --newline option.
msg141371 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2011-07-29 13:18
Jason, one way or another, a prompt fix for trunk is required, since `make patchcheck` is an important step for committing patches.
msg141373 - (view) Author: Roundup Robot (python-dev) Date: 2011-07-29 13:33
New changeset 2547f7965733 by Jason R. Coombs in branch 'default':
Issue #10639: spec_newline wasn't defined globally unless main() was called; now spec_newline is set at module import/execution
http://hg.python.org/cpython/rev/2547f7965733
msg141375 - (view) Author: Jason R. Coombs (jason.coombs) * (Python committer) Date: 2011-07-29 13:36
I hadn't realized that the other global variables were defined at the module level or I would have implemented this originally. Please let me know if this patch doesn't correct the issue.
msg141418 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2011-07-30 03:52
"make patchcheck" is working again.
Thanks
msg141547 - (view) Author: Scott Dial (scott.dial) Date: 2011-08-02 07:08
I haven't seen anyone use a side-effect-less statement (a string) as a comment before, but I doubt that is an approved style for the CPython codebase. Please change the string preceeding the spec_line definition into a proper comment.
msg141558 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-08-02 10:58
Well, this is actually blessed by http://www.python.org/dev/peps/pep-0257/, but if that convention were actually followed the docstring would go *after* the assignment.  But I agree that it is rarely used, and as far as I know is not used in the stdlib at all.
msg141563 - (view) Author: Roundup Robot (python-dev) Date: 2011-08-02 12:20
New changeset dc96af0e7f60 by Jason R. Coombs in branch 'default':
Corrected attribute docstring per pep-257 (reference #10639)
http://hg.python.org/cpython/rev/dc96af0e7f60
History
Date User Action Args
2011-08-02 12:20:52python-devsetmessages: + msg141563
2011-08-02 10:58:34r.david.murraysetnosy: + r.david.murray
messages: + msg141558
2011-08-02 07:08:19scott.dialsetnosy: + scott.dial
messages: + msg141547
2011-07-30 13:32:29eric.araujosetfiles: - unnamed
2011-07-30 03:52:29eli.benderskysetfiles: + unnamed

messages: + msg141418
2011-07-29 13:38:51jason.coombssetstatus: open -> closed
stage: needs patch -> commit review
2011-07-29 13:36:52jason.coombssetmessages: + msg141375
2011-07-29 13:33:49python-devsetmessages: + msg141373
2011-07-29 13:18:54eli.benderskysetmessages: + msg141371
2011-07-29 13:16:56jason.coombssetmessages: + msg141370
2011-07-29 12:42:43eric.araujosetmessages: + msg141361
2011-07-29 04:34:36eli.benderskysetmessages: + msg141328
2011-07-29 04:31:42eli.benderskysetpriority: normal -> high
stage: resolved -> needs patch
2011-07-29 03:55:43eli.benderskysetstatus: closed -> open
nosy: + eli.bendersky
messages: + msg141325

2011-07-28 13:44:21eric.araujosetstage: commit review -> resolved
2011-07-27 18:42:24jason.coombssetstatus: open -> closed
resolution: fixed
2011-07-27 18:41:10python-devsetmessages: + msg141261
2011-07-27 16:16:28jason.coombssetmessages: + msg141239
2011-07-27 15:24:21eric.araujosetmessages: + msg141233
2011-07-26 20:34:12pitrousetnosy: - pitrou
2011-07-26 16:59:41jason.coombssetmessages: + msg141171
2011-07-26 16:09:44eric.araujosetmessages: + msg141165
2011-07-26 16:01:07jason.coombssetmessages: + msg141164
2011-07-26 15:59:25python-devsetnosy: + python-dev
messages: + msg141163
2011-07-26 15:41:24eric.araujosetassignee: eric.araujo -> jason.coombs
stage: patch review -> commit review
2011-07-26 14:24:59eric.araujosetmessages: + msg141160
2011-07-26 14:16:54jason.coombssetmessages: + msg141159
2011-07-26 12:46:37eric.araujosetmessages: + msg141154
2011-07-26 12:41:58jason.coombssetmessages: + msg141153
2011-07-26 12:40:34ezio.melottisetfiles: + 900df5732f93.diff
2011-07-26 12:40:09ezio.melottisetfiles: - 900df5732f93.diff
2011-07-26 12:33:43eric.araujosettitle: reindent.py converts newlines to platform default -> reindent.py should not convert newlines
messages: + msg141151
versions: + Python 3.3, - Python 3.1
2011-07-26 12:26:38eric.araujosetfiles: + 900df5732f93.diff
2011-07-25 13:56:41jason.coombssethgrepos: + hgrepo45
messages: + msg141092
2011-05-15 14:26:37jason.coombssetmessages: + msg136030
2011-01-29 20:09:15eric.araujosetnosy: jason.coombs, orsenthil, pitrou, eric.araujo
messages: + msg127482
2011-01-29 20:04:40jason.coombssetnosy: jason.coombs, orsenthil, pitrou, eric.araujo
messages: + msg127481
2011-01-29 17:38:11eric.araujosetnosy: jason.coombs, orsenthil, pitrou, eric.araujo
messages: + msg127449
2011-01-24 03:21:48orsenthilsetnosy: + orsenthil
messages: + msg126915
2011-01-09 01:11:17eric.araujosetstatus: pending -> open
nosy: + pitrou
messages: + msg125818

2010-12-19 23:43:03eric.araujosetstatus: open -> pending

messages: + msg124369
assignee: eric.araujo
2010-12-06 16:51:03eric.araujosetversions: + Python 2.7
nosy: + eric.araujo

messages: + msg123478

type: behavior
stage: patch review
2010-12-06 16:42:42jason.coombssetfiles: + reindent-autonewline.patch

messages: + msg123477
2010-12-06 16:40:38jason.coombssetfiles: - reindent-autonewline.patch
2010-12-06 16:36:51jason.coombscreate