Author mdk
Recipients mdk
Date 2020-11-01.23:49:23
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <>
I was not here 21 years ago when it was introduced [1], but according to the commit message it was introduced to find leftover Latex mardown.

It tries to find 4 patterns in Sphinx node text (not in raw rst files):

::(?=[^=])|            # two :: (but NOT ::=)

This one has ~100 false positive in susp-ignored.csv (pypi classifiers, slices, ipv6, ...) 

:[a-zA-Z][a-zA-Z0-9]+| # :foo

This one has ~300 false positive in susp-ignored.csv (slices, C:\, ipv6, ...)

`|                     # ` (seldom used by itself)

This one has ~20 false positive in susp-ignored.csv (mostly reStructuredText in code-blocks)

(?<!\.)\.\.[ \t]*\w+:  # .. foo: (but NOT ... else:)

This one does not have false positives.

The script, on my laptop (with a core i9), is slow (4mn20s), and it's probably way slower on the CI.

I tried to search for `suspicious is:pr in:comments` on github to see if it's usefull:

- 2 contributor had an issue with the script (gh-9748, gh-21940)
- 5 had to add false positive to susp-ignored.csv (gh-20556, gh-13772, gh-11481, gh-9317, gh-6915)
- 4 had to update susp-ignored.csv (gh-11769, gh-5552, gh-3694, gh-2719)
- 1 did not addedd to susp-ignored but changed to avoid a false positive (gh-18939)

Case where it actually helped:

- Finding an error: (gh-12562 .. literalinclude: instead of .. literalinclude::)
- Finding refs in code block (gh-7413)
- Writing plaintext in Misc/NEWS (gh-1339)

I'd go for enhancing (which is fast, ~1s on my laptop) a bit to try to handle the `.. literalinclude:` missing a `:` errors, and dropping suspicious.

So I'd appreciate feedback on this script, did it helped you recently?

Date User Action Args
2020-11-01 23:49:23mdksetrecipients: + mdk
2020-11-01 23:49:23mdksetmessageid: <>
2020-11-01 23:49:23mdklinkissue42238 messages
2020-11-01 23:49:23mdkcreate