classification
Title: RFE: Run linkchecker on documentation on the CI
Type: Stage:
Components: Documentation Versions: Python 3.10
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: docs@python Nosy List: amaajemyfren, docs@python, hroncok, mdk, ned.deily, petdance, terry.reedy, vstinner
Priority: normal Keywords:

Created on 2020-05-25 16:09 by hroncok, last changed 2020-05-30 02:14 by terry.reedy.

Messages (7)
msg369892 - (view) Author: Miro Hrončok (hroncok) * Date: 2020-05-25 16:09
In Fedora, we run the following check when we build Python documentation:

# Verify that all of the local links work
#
# (we can't check network links, as we shouldn't be making network connections
# within a build.  Also, don't bother checking the .txt source files; some
# contain example URLs, which don't work)
linkchecker \
  --ignore-url=^mailto: --ignore-url=^http --ignore-url=^ftp \
  --ignore-url=.txt\$ --no-warnings \
  Doc/build/html/index.html

From time to time, it discovers broken links:

  https://github.com/python/cpython/pull/15700
  https://github.com/python/cpython/pull/20383
  https://github.com/python/cpython/pull/20388

It would be really nice if this check run as part of the CI that builds the documentation.
msg369893 - (view) Author: Miro Hrončok (hroncok) * Date: 2020-05-25 16:17
Side note: linkchecker can be installed via pip, but the released version is not Python 3 compatible. In Fedora, we package it from git.
msg370196 - (view) Author: Miro Hrončok (hroncok) * Date: 2020-05-28 12:12
Note: I would gladly contribute this check, but I have no idea where should I do that.
msg370202 - (view) Author: Ama Aje My Fren (amaajemyfren) * Date: 2020-05-28 13:01
On Thu, May 28, 2020 at 3:13 PM Miro Hrončok <report@bugs.python.org> wrote:

>
> Note: I would gladly contribute this check, but I have no idea where should I do that.
>

I don't know either. I suspect it will have to be with one of the
CI/CD providers that cpython uses.

I _think_ it uses three:
a. Travis  cpython/.travis.yml
b. Github Actions .github/workflows/doc.yml
c. Azures Pipelines .azure-pipelines/docs-steps.yml

Beyond that no idea. I fear I am also blind here. Still google is my friend.
msg370226 - (view) Author: Andy Lester (petdance) * Date: 2020-05-28 15:20
Some high-level questions to consider:

* Is it run only when a build of the docs is started?  Or should it be done regularly (daily/weekly?) to keep an eye on links so that it's not a surprise when build time comes along?

* Does a broken link stop the build, or is it just advisory?

* Who sees the results?  Are they emailed to someone?  A mailing list?  Posted somewhere publicly?

* Is someone assigned responsibility for acting on the failures?

* What counts as a failure?  Is a 301 redirect OK?  It seems that a 301 might be OK to pass, but someone should know about it to update to the new URL.

I am not familiar with the current documentation build process, so forgive me if these are already answered somehow.  I'm not looking for answers myself, but providing suggestions.
msg370270 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2020-05-28 22:25
I think our CI checks already take too long to run and use possibly more than our fair share of global open source resources (provided by GitHub, Travis, MS Azure) especially considering how infrequently you would expect to find a problem and the low severity of missing one immediately.  I think a more appropriate choice would be to set up a buildbot to do such a check, perhaps weekly is often enough, not more than daily.

Julien, what do you think?
msg370353 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2020-05-30 02:14
Something rebuilds the online docs once a day.  That same something might be appropriate for running a link checker (including external links) once a week, say.
History
Date User Action Args
2020-05-30 02:14:20terry.reedysetnosy: + terry.reedy
messages: + msg370353
2020-05-28 22:25:23ned.deilysetnosy: + ned.deily
messages: + msg370270
2020-05-28 22:19:13ned.deilysetnosy: + mdk
2020-05-28 15:20:04petdancesetmessages: + msg370226
2020-05-28 13:01:20amaajemyfrensetnosy: + amaajemyfren
messages: + msg370202
2020-05-28 12:12:54hroncoksetmessages: + msg370196
2020-05-25 16:56:30petdancesetnosy: + petdance
2020-05-25 16:17:39hroncoksetmessages: + msg369893
2020-05-25 16:09:02hroncokcreate