New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reindent.py should not convert newlines #54848
Comments
When reindent.py runs, it will convert the line endings for each file it converts to the default line ending for the platform on which reindent.py runs. It would be better if reindent.py would retain line endings of the source file. Attached is a patch that addresses this issue. |
Merged the patch with the latest trunk. |
Looks good. |
If nobody objects, I will commit this when py3k is unfrozen. |
I did a bit of testing on demo files. The first one I tested with had mixed EOLs, in which case the newlines attribute will be a tuple and open(..., newline=r.newlines) will fail. For reference: http://docs.python.org/dev/library/io#io.TextIOBase.newlines What would be the best behavior in such cases? Refuse the temptation to guess and exit? |
Why is reindent.py translating the whole file to platform default newline such a bad thing? Could not it be considered as helpful behavior, especially in the case where the user has mixed EOLs by mistake? |
I’m not opposed to reindent normalizing EOLs, given that our VCS hooks will translate them to the right thing (for example, making sure that files specific to the Windows build use CRLF, not LF, even if edited on POSIX). |
That's great that reindent works in your environment, but that's no help in my environment. Here's the use case: We have a software package developed in Unix and with Unix LF line endings. The code base may also contain files with CRLF line endings (I can't be certain). I'm developing in Windows (which includes reindent.py installed by default). I'd like to use that tool to reindent the code base without munging the line endings. I can't do this without first modifying reindent.py or using another tool to convert the line endings afterword (at which point I don't know if the Windows line endings were created by reindent.py or were there natively). If someone wants the line endings normalized on their code base, they should use another tool (or possibly an option to reindent.py), but there's definitely a case for using reindent.py without altering the line endings. |
That is a valid use case, so +1 on not changing behavior and adding a new option in 3.3 (in another bug report). What is your reply to msg125818? |
Re msg125818, I agree that the mixed EOLs is also a potential problem, though the proposed solution behaves better than the status quo, where the EOLs get converted without warning. It would be desirable for the patch to be more robust in that situation (providing a useful error message about why it fails). What might even be better is to retain newlines even if they're mixed. I'll explore a cross-platform technique to consistently retain newlines as encountered. |
I looked into the possibility of retaining newline characters even for files with mixed newlines, but I've decided that's too intrusive an approach. The current implementation specifically takes measures to strip whitespaces from the ends of lines, so it seems outside the scope of this bug to alter that behavior. So I've taken another stab at a more robust implementation that does newline detection but raises an error if the file contains mixed newlines. Furthermore, it adds an option (--newline) to specify which newline character to use, bypassing the mixed-newline error and allowing the user to override newline detection. Here's a demo run:
I've published this change as https://bitbucket.org/jaraco/cpython-issue10639/changeset/900df5732f93. Please review. If this changeset is acceptable, I will push the revisions to the master repo. Please advise if I may also backport to Python 3.2 and 2.7. |
Thanks for your work. The generated patch is for some reason unreadable, so I looked at the changeset on Bitbucket. It looks good, except that a new option would have to be 3.3-only, whereas the bug could be fixed in 2.7 and 3.2 too. If it’s too bothersome for you to split your current patch, then we can fix in 3.3 only. |
No problem. I'll rebase and push the current patch as-is, and then backport to 3.2 and 2.7 without the option (just raising the error when mixed newlines are encountered). |
What repo are you talking about, BTW? The developers list in the devguide does not contain your name, so I assume you’re not talking about hg.python.org/cpython?
|
My name appears as jason.coombs. But your note does remind me that I need to review the devguide, since it's been a while since I've pushed something. I may also look into linking/merging my two accounts. I plan to do as you suggest, splitting the patch into the bugfix and the new capability. |
Okay, I hadn’t seen you in http://docs.python.org/devguide/developers and I don’t recall the URI of the generated file with all names. I’ve asked that your Roundup profile be updated so that you get the Python icon and the possibility to be assigned bugs. If you already have clones, the workflow should not be hard: go to a 3.2 checkout, fix things, run patchcheck and tests, commit. Go to 3.3, pull, merge, test, commit, push. The only tricky things that can make push hooks reject changesets are trailing whitespace, maybe EOLs, and new named branches. We also prefer to push one changeset with all changes, not a series of changesets with gradual fixes. |
New changeset 070dc6e359fb by Jason R. Coombs in branch '3.2': New changeset 826a0208d143 by Jason R. Coombs in branch 'default': New changeset 4feb889d3bed by Jason R. Coombs in branch 'default': |
I've stripped the undesirable revisions and updated the bitbucket repo so it now contains three changesets for Python 3.2 and 3.3 and suggested. I don't believe running the test suite is relevant, as I grepped the test suite for reindent, and there's no reference to the tool script. It took several rounds of rebasing and stripping and importing patches to get everything right, but I believe these changes meet the desired goals. |
hg diff combined with hg import can help you flatten a series of changeset into one.
|
What do you think about this: https://bitbucket.org/jaraco/cpython-issue10639/changeset/ea63310dddce The patch is a little more intrusive than the Python 3 patch because Python 2.7 doesn't allow specifying the newline to use when writing a file (afaict), but it abstracts that intrusiveness in the NewlineAdapter class. I've tested the script on a file with mixed newlines, a file with non-native newlines, and a file with content but no newlines, and it performs as expected. |
Instead of writing a new class, what about using io.open instead of the builtin? Then you’ll be able to use the same patch than 3.2. |
Ooh. Excellent suggestion. |
New changeset f9cf55bbe9b9 by Jason R. Coombs in branch '2.7': |
It appears this breaks "make patchcheck" in trunk: ./python ./Tools/scripts/patchcheck.py
Getting the list of files that have been added/changed ... 5 files
Fixing whitespace ... Traceback (most recent call last):
File "./Tools/scripts/patchcheck.py", line 147, in <module>
main()
File "./Tools/scripts/patchcheck.py", line 129, in main
normalize_whitespace(python_files)
File "./Tools/scripts/patchcheck.py", line 22, in call_fxn
result = fxn(*args, **kwargs)
File "./Tools/scripts/patchcheck.py", line 66, in normalize_whitespace
fixed = [path for path in file_paths if path.endswith('.py') and
File "./Tools/scripts/patchcheck.py", line 67, in <listcomp>
reindent.check(path)]
File "/home/eliben/python-src/33/Tools/scripts/reindent.py", line 129, in check
newline = spec_newline if spec_newline else r.newlines
NameError: global name 'spec_newline' is not defined
make: *** [patchcheck] Error 1 |
This is a good example of why passing parameters into functions by means of globals sucks. In reindent.py, main() sets the spec_newline global which check() uses, but this was forgotten in patchcheck.py which also uses reindent.check() IMHO reindent.py should be rewritten to ditch all globals |
I agree that using globals instead function arguments sucks, but let’s fix this bug without rewriting the whole file (unless this affects only the code new in 3.3). |
The 'spec_newline' in particular is only in the trunk (not in the backports), as it's part of the new --newline option. |
Jason, one way or another, a prompt fix for trunk is required, since |
New changeset 2547f7965733 by Jason R. Coombs in branch 'default': |
I hadn't realized that the other global variables were defined at the module level or I would have implemented this originally. Please let me know if this patch doesn't correct the issue. |
"make patchcheck" is working again. |
I haven't seen anyone use a side-effect-less statement (a string) as a comment before, but I doubt that is an approved style for the CPython codebase. Please change the string preceeding the spec_line definition into a proper comment. |
Well, this is actually blessed by http://www.python.org/dev/peps/pep-0257/, but if that convention were actually followed the docstring would go *after* the assignment. But I agree that it is rarely used, and as far as I know is not used in the stdlib at all. |
New changeset dc96af0e7f60 by Jason R. Coombs in branch 'default': |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: