Message 380167 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	mdk
Recipients	mdk
Date	2020-11-01.23:49:23
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1604274563.95.0.828690163802.issue42238@roundup.psfhosted.org>
In-reply-to

Content
I was not here 21 years ago when it was introduced [1], but according to the commit message it was introduced to find leftover Latex mardown. It tries to find 4 patterns in Sphinx node text (not in raw rst files): ::(?=[^=])\| # two :: (but NOT ::=) This one has ~100 false positive in susp-ignored.csv (pypi classifiers, slices, ipv6, ...) :[a-zA-Z][a-zA-Z0-9]+\| # :foo This one has ~300 false positive in susp-ignored.csv (slices, C:\, ipv6, ...) `\| # ` (seldom used by itself) This one has ~20 false positive in susp-ignored.csv (mostly reStructuredText in code-blocks) (?<!\.)\.\.[ \t]*\w+: # .. foo: (but NOT ... else:) This one does not have false positives. The script, on my laptop (with a core i9), is slow (4mn20s), and it's probably way slower on the CI. I tried to search for `suspicious is:pr in:comments` on github to see if it's usefull: - 2 contributor had an issue with the script (gh-9748, gh-21940) - 5 had to add false positive to susp-ignored.csv (gh-20556, gh-13772, gh-11481, gh-9317, gh-6915) - 4 had to update susp-ignored.csv (gh-11769, gh-5552, gh-3694, gh-2719) - 1 did not addedd to susp-ignored but changed to avoid a false positive (gh-18939) Case where it actually helped: - Finding an error: (gh-12562 .. literalinclude: instead of .. literalinclude::) - Finding refs in code block (gh-7413) - Writing plaintext in Misc/NEWS (gh-1339) I'd go for enhancing rstlint.py (which is fast, ~1s on my laptop) a bit to try to handle the `.. literalinclude:` missing a `:` errors, and dropping suspicious. So I'd appreciate feedback on this script, did it helped you recently? 1: https://github.com/python/cpython/commit/700cf28f410521066f40671f1da7db0302d753fd

I was not here 21 years ago when it was introduced [1], but according to the commit message it was introduced to find leftover Latex mardown.

It tries to find 4 patterns in Sphinx node text (not in raw rst files):

::(?=[^=])|            # two :: (but NOT ::=)

This one has ~100 false positive in susp-ignored.csv (pypi classifiers, slices, ipv6, ...) 

:[a-zA-Z][a-zA-Z0-9]+| # :foo

This one has ~300 false positive in susp-ignored.csv (slices, C:\, ipv6, ...)


`|                     # ` (seldom used by itself)

This one has ~20 false positive in susp-ignored.csv (mostly reStructuredText in code-blocks)

(?<!\.)\.\.[ \t]*\w+:  # .. foo: (but NOT ... else:)

This one does not have false positives.

The script, on my laptop (with a core i9), is slow (4mn20s), and it's probably way slower on the CI.

I tried to search for `suspicious is:pr in:comments` on github to see if it's usefull:

- 2 contributor had an issue with the script (gh-9748, gh-21940)
- 5 had to add false positive to susp-ignored.csv (gh-20556, gh-13772, gh-11481, gh-9317, gh-6915)
- 4 had to update susp-ignored.csv (gh-11769, gh-5552, gh-3694, gh-2719)
- 1 did not addedd to susp-ignored but changed to avoid a false positive (gh-18939)

Case where it actually helped:

- Finding an error: (gh-12562 .. literalinclude: instead of .. literalinclude::)
- Finding refs in code block (gh-7413)
- Writing plaintext in Misc/NEWS (gh-1339)

I'd go for enhancing rstlint.py (which is fast, ~1s on my laptop) a bit to try to handle the `.. literalinclude:` missing a `:` errors, and dropping suspicious.

So I'd appreciate feedback on this script, did it helped you recently?

1: https://github.com/python/cpython/commit/700cf28f410521066f40671f1da7db0302d753fd

History
Date	User	Action	Args
2020-11-01 23:49:23	mdk	set	recipients: + mdk
2020-11-01 23:49:23	mdk	set	messageid: <1604274563.95.0.828690163802.issue42238@roundup.psfhosted.org>
2020-11-01 23:49:23	mdk	link	issue42238 messages
2020-11-01 23:49:23	mdk	create