classification
Title: `make patchcheck` should check the whitespace of .c/.h files
Type: enhancement Stage: patch review
Components: Versions: Python 3.2, Python 3.3, Python 3.4, Python 2.7
process
Status: open Resolution: accepted
Dependencies: Superseder:
Assigned To: Nosy List: belopolsky, brett.cannon, chris.jerdonek, dmalcolm, eric.araujo, ezio.melotti, jcea, mark.dickinson, pitrou, sandro.tosi, serhiy.storchaka, vstinner
Priority: low Keywords: needs review, patch

Created on 2010-06-06 03:24 by brett.cannon, last changed 2012-10-06 03:35 by ezio.melotti.

Files
File name Uploaded Description Edit
refactor.diff eric.araujo, 2010-06-22 10:56 Simple refactoring to support upcoming improvements
Messages (29)
msg107178 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2010-06-06 03:24
This would be especially useful now that only spaces are used in .c/.h files and not tabs.
msg107754 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2010-06-13 21:33
Here’s a quick crack at it. Conservative approach: Don’t reindent, just complain.
msg107762 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2010-06-13 22:40
Updating patch to check header files too. Untested.
msg107817 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2010-06-14 22:12
Thanks for the patch, Éric. I will get to it when I can.
msg107824 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2010-06-14 22:31
I may add that although there is no formal test of any kind, I added tabs in two places in one .c file and in one place in another place, and it correctly reported two files. It’s a small function mostly copied from the one above; the only part where I used my brain is the counter (Python did the rest), so it should be ok.
msg107831 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-06-14 22:48
Éric,

It will be helpful if your code could also check for lines longer than 79 characters.
msg107832 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-06-14 22:49
.. and trailing white space.
msg107844 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2010-06-14 23:53
Replacing VCS hooks, are we? :) Updated and attached.

Ideas for other feature requests:
- Printing the name of files with issues;
- Fail-fast mode or full report.
msg107849 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-06-15 07:26
Nitpick:  shouldn't that

    len(line) > 79

be

    len(line) > 80

?  Either that, or strip the line ending from the line before computing its length.
msg107850 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-06-15 07:36
Similarly, isn't

  line[-1].isspace()

always going to be true (well, except possibly at the end of the file), thanks to the line-ending character?
msg107853 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2010-06-15 08:43
Forgot readling kept the line end, thanks for catching. Updated (yay for Mercurial Queues!)
msg107854 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2010-06-15 08:45
s/readling/readline/
s/readline/iterating over the file/
msg107856 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-06-15 09:16
Sorry to be nitpicky, but:

(a) the line[-2] produces "IndexError: string index out of range" on empty
lines, and

(b) this won't detect trailing whitespace on the last line of a file, if there's no newline at the end of the file.  Actually, it might also be worth checking that there *is* a newline at the end of the file;  this is actually required by the C standards.  (E.g., C99 5.1.1.2p2: "A source file that is not empty shall end in a new-line character".)
msg107857 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-06-15 09:18
s/actually//g
msg107861 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2010-06-15 10:08
Thanks for the helpful reviews. I have fixed the trailing whitespace check with “line[-2:-1].isspace()”, but I have a bug with my file counting. Before I go further, I’d like feedback from people using patchcheck:

1) Reindenting Python is a task best handled by reindent.py, but checking for tabs, line length and trailing whitespace is basically a reimplementation of grep and wc. Is it an explicit and important goal that patchcheck.py be independent of external tools?

2) Do you like the current report format? (“checking for one thing: n files”) It requires you to use grep of your editor’s search after checking. What about printing out file names and line numbers?

3) What about a function to strip trailing spaces and add a final newline where needed instead of just complaining? Is it okay to replace tabs with spaces in C too?
msg107862 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2010-06-15 10:12
Another one: I’d like to move the file name filtering from the worker functions into the main function. With that change, looping over all file names to get only the C files would be written and run only once, even if there are four functions that operate on these files. There is not backward compatibility guarantee for Tools/*, right?
msg107864 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-06-15 10:24
Answering as a rather infrequent user of 'make patchcheck', but someone who vows to use it more often in future... :)

(1) Well, it would be awkward to use grep or wc on Windows, so it's convenient not to need external tools.

(2) +1 to a more detailed report, at least giving the file name and what was wrong with it ("trailing whitespace in yourfile.c");  I could imagine that people might care about finding trailing whitespace or tabs, but not about line length, for example.  (Many of the existing C files already have lines >= 80 characters.)

(3) Don't really care:  emacs already does both these things for me nicely.  I'd rather just have patchcheck do the reporting.  And automatic replacement of tabs with spaces in C files might not guess correctly what the author intended.
msg107865 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-06-15 10:46
and your proposed refactoring sounds fine to me.  I can't really see any backwards compatibility concerns.
msg108362 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2010-06-22 10:56
1) Okay, so I’ll refactor some code in patchcheck into a grep-like function to reduce duplication and support adding new checks.

2) I’ll add more useful report. Perhaps I’ll ask first on python-dev to see if I should add a command-line option or just improve it. Also, whether I should use logging or just output file names and lines.

3) Well, if your editor strips trailing spaces and adds final newlines, you won’t ever see make patchcheck complain, will you? Since the report will be helpful only to people with less smart editors, I think I’ll go with replacing instead of just reporting. Tabs will not be replaced, though, just reported.

4) See attached patch for py3k (no idea if this will go into 2.7 too).
msg108367 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-06-22 11:21
> I think I’ll go with replacing instead of just reporting.

How about a --report-only (insert better name here) command line option to patchcheck, then?

My editor doesn't delete trailing whitespace *automatically*, but it's trivial to remove trailing whitespace from a file if you know it's there (M-x delete-trailing-whitespace, I believe);  some people (like me) might prefer to make all changes within the editor rather than having an external tool do it for them.  This is especially true if you've got the editor open for the relevant file at the time, since you can end up with two different conflicting versions of the file---one in the editor and one on disk;  saving in the editor can undo the changes that patchcheck made on disk.
msg108369 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2010-06-22 11:23
Agreed. reindent.py has a dry-run mode too, so adding such a global flag and making it the default is okay with me.
msg109269 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2010-07-04 22:06
Committed in r82567. I also cleaned up the output slightly so that "N file(s)" pluralizes "file" properly based on N. Also now list what files have their whitespace fixed.
msg109296 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-07-05 04:34
Hmm, it looks like patchcheck does not fix trailing whitespace in c files:


$ grep ' $' Modules/datetimemodule.c| cat -ve
    $
                   GET_TD_DAYS(offset1) * 86400 - $
                Py_DECREF(offset);                $
	    if ((offset1 != offset2) && $
 $
$ make patchcheck
-----------------------------------------------
Modules/Setup.dist is newer than Modules/Setup;
check to make sure you have all the updates you
need in your Modules/Setup file.
Usually, copying Modules/Setup.dist to Modules/Setup will work.
-----------------------------------------------
./python.exe ./Tools/scripts/patchcheck.py
Getting the list of files that have been added/changed ... 1 file
Fixing whitespace ... 0 files
Docs modified ... NO
Misc/ACKS updated ... NO
Misc/NEWS updated ... NO

Did you run the test suite?
msg109302 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2010-07-05 05:40
Sorry if I was unclear. Out of the four points I listed, this patch only did the refactoring (4). I have another patch to add and use a check_pep7 function, I’ll refresh it and post it.
msg109312 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2010-07-05 07:38
On Sun, Jul 4, 2010 at 22:40, Éric Araujo <report@bugs.python.org> wrote:
>
> Éric Araujo <merwok@netwok.org> added the comment:
>
> Sorry if I was unclear. Out of the four points I listed, this patch only did the refactoring (4). I have another patch to add and use a check_pep7 function, I’ll refresh it and post it.

You were clear, I just prematurely closed it by accident.
msg113847 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2010-08-13 22:47
I wonder how to have sane code reuse between patchcheck and Mercurial hooks. Mark seems to have started seconding Dirkjan with hooks, see e.g. 
http://hg.python.org/hooks/rev/0344575ad60e, so I wonder if we’re going to use only hooks (pro: they will be invoked on every push, whereas patchcheck is opt-in, con: they would require that contributors clone the hooks repo and set them up) or still maintain patchcheck, which has pros (no setup, does not require a VCS) and cons (not automatic, not widely used, whereas hooks can be part of the new python-dev policy).
msg113848 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-08-13 22:56
Note directly related to this issue, but untabify.py fails on files that contain non-ascii characters.  For example:

$ ./python.exe Tools/scripts/untabify.py Modules/_heapqmodule.c
Traceback (most recent call last):
    ...
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe7 in position 173: invalid continuation byte


I am not sure what relevant C standard has to say about using non-ascii characters in comments, but the checking tool should not fail with a traceback in such situation.
msg113851 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-08-13 23:29
I opened issue9598 for the untabify bug.  For the purposes of source checking, I believe non-ascii characters should be disallowed in python C source code.  According to my understanding of C89 standard, interpretation of characters outside of the basic character set is implementation and locale dependent.
msg169585 - (view) Author: Chris Jerdonek (chris.jerdonek) * (Python committer) Date: 2012-08-31 21:19
The following commit from today is related to this issue: 815b88454e3e

"Remove trailing whitespace in order to silence warnings on HP-UX."
History
Date User Action Args
2012-10-06 03:35:48ezio.melottisetnosy: + ezio.melotti

versions: + Python 3.3, Python 3.4
2012-10-04 10:51:23jceasetnosy: + jcea
2012-08-31 21:19:36chris.jerdoneksetnosy: + chris.jerdonek
messages: + msg169585
2012-08-08 08:58:32ned.deilylinkissue15550 superseder
2012-08-06 07:33:43serhiy.storchakasetnosy: + serhiy.storchaka
2011-07-16 22:27:13sandro.tosisetnosy: + sandro.tosi
2011-06-27 23:16:07vstinnersetnosy: + vstinner
2011-06-27 00:35:45brett.cannonsetassignee: brett.cannon ->
2010-10-21 12:50:05georg.brandlunlinkissue9095 dependencies
2010-08-13 23:29:23belopolskysetmessages: + msg113851
2010-08-13 22:56:30belopolskysetmessages: + msg113848
2010-08-13 22:47:13eric.araujosetmessages: + msg113847
2010-07-21 15:11:40dmalcolmsetnosy: + dmalcolm
2010-07-05 07:39:00brett.cannonsetmessages: + msg109312
2010-07-05 05:40:19eric.araujosetmessages: + msg109302
2010-07-05 04:34:58belopolskysetstatus: closed -> open

messages: + msg109296
2010-07-04 22:06:37brett.cannonsetstatus: open -> closed
2010-07-04 22:06:24brett.cannonsetmessages: + msg109269
2010-07-04 18:42:45eric.araujolinkissue9095 dependencies
2010-06-22 11:23:44eric.araujosetmessages: + msg108369
2010-06-22 11:21:17mark.dickinsonsetmessages: + msg108367
2010-06-22 10:58:31eric.araujosetkeywords: + needs review
resolution: accepted
2010-06-22 10:56:12eric.araujosetfiles: + refactor.diff

messages: + msg108362
2010-06-15 10:46:33mark.dickinsonsetmessages: + msg107865
2010-06-15 10:24:10mark.dickinsonsetmessages: + msg107864
2010-06-15 10:12:10eric.araujosetmessages: + msg107862
2010-06-15 10:08:16eric.araujosetmessages: + msg107861
2010-06-15 09:39:08eric.araujosetfiles: - patchcheck-pep7.diff
2010-06-15 09:18:29mark.dickinsonsetmessages: + msg107857
2010-06-15 09:16:55mark.dickinsonsetmessages: + msg107856
2010-06-15 08:45:37eric.araujosetmessages: + msg107854
2010-06-15 08:43:43eric.araujosetfiles: + patchcheck-pep7.diff

messages: + msg107853
2010-06-15 08:29:11eric.araujosetfiles: - patchcheck-pep7.diff
2010-06-15 07:36:57mark.dickinsonsetmessages: + msg107850
2010-06-15 07:26:49mark.dickinsonsetnosy: + mark.dickinson
messages: + msg107849
2010-06-14 23:53:52eric.araujosetfiles: - check-tabs.diff
2010-06-14 23:53:49eric.araujosetfiles: - check-tabs.diff
2010-06-14 23:53:42eric.araujosetfiles: + patchcheck-pep7.diff

messages: + msg107844
2010-06-14 22:49:11belopolskysetmessages: + msg107832
2010-06-14 22:48:38belopolskysetmessages: + msg107831
2010-06-14 22:31:17eric.araujosetmessages: + msg107824
2010-06-14 22:12:34brett.cannonsetassignee: brett.cannon
messages: + msg107817
2010-06-13 22:40:44eric.araujosetfiles: + check-tabs.diff

messages: + msg107762
2010-06-13 22:30:18eric.araujosetnosy: + pitrou
2010-06-13 21:33:52eric.araujosetfiles: + check-tabs.diff

versions: + Python 2.7, Python 3.2
keywords: + patch
nosy: + eric.araujo

messages: + msg107754
stage: patch review
2010-06-12 02:44:13belopolskysetnosy: + belopolsky
2010-06-06 03:24:37brett.cannoncreate