classification
Title: Deprecate suspicious.py?
Type: Stage: patch review
Components: Documentation Versions:
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: mdk Nosy List: erlendaasland, lukasz.langa, mdk, ned.deily, pablogsal, rhettinger, vstinner
Priority: normal Keywords: patch

Created on 2020-11-01 23:49 by mdk, last changed 2021-07-19 14:34 by lukasz.langa.

Pull Requests
URL Status Linked Edit
PR 23313 merged mdk, 2020-11-16 08:08
PR 23802 merged mdk, 2020-12-16 14:20
PR 26575 merged mdk, 2021-06-07 08:58
PR 26966 merged mdk, 2021-06-30 09:36
PR 27238 merged mdk, 2021-07-19 12:36
Messages (27)
msg380167 - (view) Author: Julien Palard (mdk) * (Python committer) Date: 2020-11-01 23:49
I was not here 21 years ago when it was introduced [1], but according to the commit message it was introduced to find leftover Latex mardown.

It tries to find 4 patterns in Sphinx node text (not in raw rst files):

::(?=[^=])|            # two :: (but NOT ::=)

This one has ~100 false positive in susp-ignored.csv (pypi classifiers, slices, ipv6, ...) 

:[a-zA-Z][a-zA-Z0-9]+| # :foo

This one has ~300 false positive in susp-ignored.csv (slices, C:\, ipv6, ...)


`|                     # ` (seldom used by itself)

This one has ~20 false positive in susp-ignored.csv (mostly reStructuredText in code-blocks)

(?<!\.)\.\.[ \t]*\w+:  # .. foo: (but NOT ... else:)

This one does not have false positives.

The script, on my laptop (with a core i9), is slow (4mn20s), and it's probably way slower on the CI.

I tried to search for `suspicious is:pr in:comments` on github to see if it's usefull:

- 2 contributor had an issue with the script (gh-9748, gh-21940)
- 5 had to add false positive to susp-ignored.csv (gh-20556, gh-13772, gh-11481, gh-9317, gh-6915)
- 4 had to update susp-ignored.csv (gh-11769, gh-5552, gh-3694, gh-2719)
- 1 did not addedd to susp-ignored but changed to avoid a false positive (gh-18939)

Case where it actually helped:

- Finding an error: (gh-12562 .. literalinclude: instead of .. literalinclude::)
- Finding refs in code block (gh-7413)
- Writing plaintext in Misc/NEWS (gh-1339)

I'd go for enhancing rstlint.py (which is fast, ~1s on my laptop) a bit to try to handle the `.. literalinclude:` missing a `:` errors, and dropping suspicious.

So I'd appreciate feedback on this script, did it helped you recently?

1: https://github.com/python/cpython/commit/700cf28f410521066f40671f1da7db0302d753fd
msg380169 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2020-11-02 00:07
I would love to see this disappear.  For me, it has been a recurring time waster.
msg380171 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2020-11-02 00:51
If it still does useful checks, we could just limit it to running by release managers during the release manufacturing process; it is already run then. It would still allow for problems to be caught and fixed by the RM prior to release tagging.  But I don't have a strong opinion about its overall usefulness. I recall it catching at least one problem in a release I was involved with and that was probably before we had the CI checks.
msg381075 - (view) Author: Julien Palard (mdk) * (Python committer) Date: 2020-11-16 07:50
Good idea Ned!

So proposed plan:

- Drop it from docs build and the CI to avoid time loss.
- Add it as a step of PEP 101, for a few release, for good measures
- I'll check it from time to time between releases, just to ensure it does not accumulate tons of things to do on the the release day.

If during a few release, we notice this tool is no loger usefull we'll be able to drop it.

If we spot some errors that can be migrated to the rstlint.py we'll do.

If this tool is in fact usefull, we'll have to think about it again.
msg381802 - (view) Author: Julien Palard (mdk) * (Python committer) Date: 2020-11-25 09:18
New changeset c9c6e9f89aa68ce8094393a1a5575b67d26bc8c8 by Julien Palard in branch 'master':
bpo-42238: Doc: Remove make suspicious from the CI and docs builds. (GH-23313)
https://github.com/python/cpython/commit/c9c6e9f89aa68ce8094393a1a5575b67d26bc8c8
msg382219 - (view) Author: Julien Palard (mdk) * (Python committer) Date: 2020-12-01 08:21
I'll try to track this closely, the drop of `make suspicious` has been merged in master, but not backported yet, one week ago.

Up to today, no commit has introduced suspicious warnings (0 false positive, 0 false negative).

I think I'll soon backport the PR, if no one objects, because the current situation is a bit risky: some PR could get merged to master but fail the tests during the backport.
msg383149 - (view) Author: Julien Palard (mdk) * (Python committer) Date: 2020-12-16 11:06
Today I spotted one error that suspicious could have spotted:

    -both ``Callable``s no longer validate their ``argtypes``, in 
    +both ``Callable``\ s no longer validate their ``argtypes``, in 

I'm fixing it in https://github.com/python/cpython/pull/23800 

`make suspicious` found it as:

    [whatsnew/changelog:23] "`" found in "Callable``s no longer validate their ``argtypes"

So I'll doodle around this to see if tools/rstlint.py or Sphinx itself can see the error.
msg383171 - (view) Author: Julien Palard (mdk) * (Python committer) Date: 2020-12-16 14:29
I created https://github.com/python/cpython/pull/23802 with a port of the "wrong backtick detector" (at least, one aspect from it) from suspicious to rstlint.

I dropped the suspicious check three weeks ago, and found a single committed issue since then, which was fixable (and is fixed) in rstlint, so I'll continue this experience and see where it goes.
msg383292 - (view) Author: Julien Palard (mdk) * (Python committer) Date: 2020-12-18 09:48
New changeset b9735420aa8ecd2555fe3dc000863bc50487334b by Julien Palard in branch 'master':
bpo-42238: Check Misc/NEWS.d/next/ for reStructuredText issues. (GH-23802)
https://github.com/python/cpython/commit/b9735420aa8ecd2555fe3dc000863bc50487334b
msg390898 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-04-12 20:21
Pablo reenabled the check:

commit 20ac34772aa9805ccbf082e700f2b033291ff5d2 (upstream/master, master)
Author: Pablo Galindo <Pablogsal@gmail.com>
Date:   Mon Apr 12 20:53:15 2021 +0100

    Fix Sphinx errors in the documentation and re-activate the suspicious check (GH-25368)
    
    The suspicious check is still executed as part of the release process and release managers have been
    lately fixing some actual errors that the suspicious target can find. For this reason, reactivate the suspicious
    until we decide what to do in a coordinated fashion.
msg390904 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2021-04-12 20:47
Background on why I reactivated it:

I every single alpha release I had to fix actual documentation errors (no false positives) like the one fixed here:

https://github.com/python/cpython/commit/20ac34772aa9805ccbf082e700f2b033291ff5d2

I don't like the check, but the release management team don't like to have to restart the release process just fo fix a bunch of docs issues that should have been detected by the CI.

If we remove the fix, we need to coordinate all together because currently the status quo is that the RM team now is responsable of fixing the documentation errors as there are several release tools (and other ones) that depend on this target passing.
msg391163 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2021-04-16 03:32
If reactivated, the tool needs to be substantially improved.  It is NOT smart.  The false positives for slicing and logging examples are unnecessarily annoying.  It creates a barrier for people submitting documentation patches.  Each of the 367 entries in the susp-ignored file represents wasted time for contributors.

Also the CSV format is arcane, hard-to-read, and hard-to-edit.  It looks like it was quickly thrown together by someone who didn't care about usability.  Perhaps there should be a simpler tools that says, "take this current failure and mark it as a false positive" without trying to be over specific.

Mandatory checks with a high false positive rate are an anti-pattern for CI systems.  Already we've had one case of a contributor (me) who abandoned a doc patch rather than fight this tooling.
msg391164 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2021-04-16 03:56
I agree with all you said, but I am against forcing the release management team to absorb the responsibility of fixing all the errors that the CI didn't catch, real or not. Releasing is already a multi-hour process and having many docs errors and formating mistakes to fix is a toll that should be paid at release time.
msg391177 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-04-16 09:40
Pablo: "I every single alpha release I had to fix actual documentation errors (no false positives)"

I don't get it. Do you mean that the Python release process is blocked by typos in the documentation? Why did you catch them in the release process, and not in the regular Python development workflow?

Can't we get the same errors in the regular workflow?

I understand that the release process requires no warning from "make suspicious", but as explained in other comments, this tool is not reliable. While it catchs some real bugs, the cost of false positives is too high.

Doc/tools/susp-ignored.csv contains 368 lines, IMO it's a sign that this tool either requires a serious reworking, or must be disabled/removed.

Doc/tools/extensions/suspicious.py is a CPython specific extension, it's not part of Sphinx. Maybe someone should open a feature request on Sphinx to get a similar feature without false positives (or at least less false positives).
msg391182 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2021-04-16 10:27
> Why did you catch them in the release process, and not in the regular Python development workflow?

Because our release tools run suspicious.py and it fails and then we need to manually inspect every single failure and fix it as almost all times there are real positives among them

> Can't we get the same errors in the regular workflow?

That's what I did reactivating suspicious.py


> I understand that the release process requires no warning from "make suspicious", but as explained in other comments, this tool is not reliable. While it catchs some real bugs, the cost of false positives is too high


Sure, we all agree here. Where it seems that we don't agree is how to proceed. I would be super happy to remove the tool if is agreed everywhere, including with the rest of release managers.
msg391185 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-04-16 11:42
Julien Palard is working actively on the Python documentation. He is the one who removed suspicious "from the CI and docs builds". I trust his judgment here. I suggest to remove it again from there *and* from the release scripts.
msg391209 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-04-16 15:00
I wrote a PR to enhance -W option documentation. The suspicious job complains whereas my Sphinx format is valid:

   The full form of argument is::

       action:message:category:module:lineno

Example of suspicious error:

[using/cmdline:438] ":message" found in "action:message:category:module:lineno"

I see that Doc/tools/suspicious.csv already contains an ignore rule for Doc/using/cmdline.rst.
msg391212 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-04-16 15:09
> Doc/tools/extensions/suspicious.py is a CPython specific extension, it's not part of Sphinx. Maybe someone should open a feature request on Sphinx to get a similar feature without false positives (or at least less false positives).

This extension was written by Gabriel Genellina in 2009 when he discovered that:

   Several documents contain invalid reST markup that "leaks"
   into the html output (missing ``, incorrect indentation, etc.)

Extension added in bpo-4811 with:

commit 700cf28f410521066f40671f1da7db0302d753fd
Author: Georg Brandl <georg@python.org>
Date:   Sun Jan 4 10:23:49 2009 +0000

    Add "suspicious" builder which finds leftover markup in the HTML files.
    
    Patch by Gabriel Genellina.


The code didn't evolved much since it was added in 2009. Main changes.

commit 144c269cc8b82ab3b877797e1c7c26b0849e2b56
Author: Ezio Melotti <ezio.melotti@gmail.com>
Date:   Thu Mar 28 18:01:11 2013 +0200

    Update the suspicious builder to detect unused rules, and remove currently unusued rules.

bpo-22537:

commit 14b5a4da2732d3464c6b40527458005ccf19f95c
Author: Georg Brandl <georg@python.org>
Date:   Thu Oct 2 08:26:26 2014 +0200

    Closes #22537: Make sphinx extensions compatible with Python 2 or 3, like in the 3.x branches

Filename renamed:

* master: Doc/tools/extensions/suspicious.py
* Commit 160cbce92adc3ccbe4ae6e231ea27fb5ff28dca9 renames Doc/tools/suspicious.py to Doc/tools/extensions/suspicious.py
* Commit f16fbf940fdaec594eb1f4c5f9c61e926db53c5d renames Doc/tools/sphinxext/suspicious.py to  Doc/tools/suspicious.py
msg392165 - (view) Author: Julien Palard (mdk) * (Python committer) Date: 2021-04-28 06:55
I should have monitored this more closely, I started monitoring it weekly, then life got over until today when I'm even surprised to see activity on the issue, sry!

(Surprise leading me to investigate why I had not received notifications from bpo, leading me to a bug in my sieve filter...)

I counted, during your releases `make suspicious` spotted:

- 1 true positive in 20ac34772aa9805ccbf082e700f2b033291ff5d2
- 1 false positive in 20ac34772aa9805ccbf082e700f2b033291ff5d2
- 1 true positive in 57f21db3f629649dbd7c4531078b6a2104896411
- 1 false positive in de833b601319da15d90c8f3cd3c44d239d6d5924

if I missed none, the success ratio is not good (which is already known).

What I'm aiming to do in this issue is to list the true positives over time, by passing the tool manually from time to time, to try to see what kind of rule is usefull in such a tool, and ultimately try to spot those errors in a reliable way.

I'd go for removing it from both release process and the CI, this would lower pressure on RMs and contributors, while easing the study of the tool in this issue.

I even consider removing it from the Makefile / tools hierarchy to ensure nobody runs it, because if someone run it locally during its contribution process it may hide true positives from me (leading me to think there's no true positives and the tool is useless).
msg392177 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-04-28 10:03
At least, I would suggest to remove it from the release process.

If some people working on the documentation want to keep the tool, maybe it can be an optional CI job? IMO the problem is that currently, it's part of a single "Documentation" job. I don't recall if it's mandatory or not. Also, we need maybe more explanation in the CI job result how to mark false positives.
msg395311 - (view) Author: Julien Palard (mdk) * (Python committer) Date: 2021-06-08 07:23
New changeset 227a09325e7bf82ecd303b4696c054a086b29a00 by Julien Palard in branch 'main':
bpo-42238: Doc CI: Disable suspicious checks. (GH-26575)
https://github.com/python/cpython/commit/227a09325e7bf82ecd303b4696c054a086b29a00
msg396567 - (view) Author: Julien Palard (mdk) * (Python committer) Date: 2021-06-27 08:34
Spotted a true positive in b19f45533942e4ad7ddf9d2d94f8b87c6f746bce:

    :const:``None``

(I'm trying to build a true positive list, to have usefull cases where suspicious is usefull, so in the long term I can maybe implement those cases in rslint instead.)
msg396568 - (view) Author: Julien Palard (mdk) * (Python committer) Date: 2021-06-27 08:40
The previous one could probably be implemented in rstlint using (an equivalent of):

    git grep ':[a-z]\+:``[^:` ]+``' Doc/

Maybe specialized to known roles, like the script specializes to known directives.
msg396774 - (view) Author: Julien Palard (mdk) * (Python committer) Date: 2021-06-30 09:01
Another true positive:

    ... versionchanged:: 3.11

from : 6cb145d23f5cf69b6d7414877d142747cd3d134c / https://github.com/python/cpython/pull/26820

I also think it can be implemented in rstlint.
msg396891 - (view) Author: Julien Palard (mdk) * (Python committer) Date: 2021-07-03 08:35
New changeset 01331f1a3cf86fd308e9a134bb867bf01fb191f5 by Julien Palard in branch 'main':
bpo-42238: rstlint: Add two new checks. (GH-26966)
https://github.com/python/cpython/commit/01331f1a3cf86fd308e9a134bb867bf01fb191f5
msg397792 - (view) Author: Julien Palard (mdk) * (Python committer) Date: 2021-07-19 12:43
Another one \o/

Fix in: https://github.com/python/cpython/pull/27238

It is:

    :func:pdb.main

Detected by make suspicious as:

    WARNING: [whatsnew/changelog:320] ":func" found in ": Refactor argument processing in :func:pdb.main to simplify"
    WARNING: [whatsnew/changelog:320] ":pdb" found in ": Refactor argument processing in :func:pdb.main to simplify"

I added a test in the same PR to spot this one in rstlint.
msg397802 - (view) Author: Ɓukasz Langa (lukasz.langa) * (Python committer) Date: 2021-07-19 14:34
New changeset fbf10080bb369cfef1f230c2cd5135a558b242d5 by Julien Palard in branch 'main':
bpo-42238: Fix small rst issue in NEWS.d/. (#27238)
https://github.com/python/cpython/commit/fbf10080bb369cfef1f230c2cd5135a558b242d5
History
Date User Action Args
2021-07-19 14:34:56lukasz.langasetnosy: + lukasz.langa
messages: + msg397802
2021-07-19 12:44:00mdksetmessages: + msg397792
2021-07-19 12:36:07mdksetpull_requests: + pull_request25787
2021-07-03 08:35:30mdksetmessages: + msg396891
2021-06-30 09:36:44mdksetpull_requests: + pull_request25530
2021-06-30 09:01:13mdksetmessages: + msg396774
2021-06-27 16:14:09erlendaaslandsetnosy: + erlendaasland
2021-06-27 08:40:29mdksetmessages: + msg396568
2021-06-27 08:34:32mdksetmessages: + msg396567
2021-06-08 07:23:06mdksetmessages: + msg395311
2021-06-07 08:58:54mdksetpull_requests: + pull_request25163
2021-04-28 10:03:10vstinnersetmessages: + msg392177
2021-04-28 06:55:13mdksetmessages: + msg392165
2021-04-16 15:09:54vstinnersetmessages: + msg391212
2021-04-16 15:00:49vstinnersetmessages: + msg391209
2021-04-16 11:42:21vstinnersetmessages: + msg391185
2021-04-16 10:27:07pablogsalsetmessages: + msg391182
2021-04-16 09:40:35vstinnersetmessages: + msg391177
2021-04-16 03:56:11pablogsalsetmessages: + msg391164
2021-04-16 03:32:18rhettingersetmessages: + msg391163
2021-04-12 20:47:32pablogsalsetmessages: + msg390904
2021-04-12 20:21:37vstinnersetnosy: + pablogsal, vstinner
messages: + msg390898
2020-12-18 09:48:14mdksetmessages: + msg383292
2020-12-16 14:29:48mdksetmessages: + msg383171
2020-12-16 14:20:05mdksetpull_requests: + pull_request22662
2020-12-16 11:06:41mdksetmessages: + msg383149
2020-12-01 08:21:25mdksetmessages: + msg382219
2020-11-25 09:18:14mdksetmessages: + msg381802
2020-11-16 08:08:04mdksetkeywords: + patch
stage: patch review
pull_requests: + pull_request22204
2020-11-16 07:50:08mdksetmessages: + msg381075
2020-11-02 00:51:19ned.deilysetnosy: + ned.deily
messages: + msg380171
2020-11-02 00:07:26rhettingersetnosy: + rhettinger
messages: + msg380169
2020-11-01 23:49:23mdkcreate