This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Buildbot web page: connection lost after 1 minute, then display "Connection restored" popup
Type: Stage: resolved
Components: Tests Versions: Python 3.10
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: EWDurbin, db3l, pablogsal, petr.viktorin, pitrou, vstinner, zach.ware
Priority: normal Keywords: patch

Created on 2020-09-03 10:21 by vstinner, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 24119 petr.viktorin, 2021-01-19 12:59
Messages (15)
msg376293 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-09-03 10:21
Since the buildbot server migrated to a new machine, the web page losts is connection and the whole web page is reloaded every minute. Try for example:

https://buildbot.python.org/all/#/release_status

The new buildbot.python.org machine is now behind a load balancer.

TCP connections closed after 1 minute already affected clients: see bpo-41642. It seems like HTTPS connections (tcp/443) are also affected.
msg381966 - (view) Author: David Bolen (db3l) * Date: 2020-11-27 20:52
I was wondering if there was any update on whether or not this new behavior can be corrected?

I was attempting to review a buildbot failure today and it's actually pretty tough to "race the refresh" when trying to review the build steps and logs.
msg381967 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-11-27 21:01
We need someone of the PSF Infra team to increase the load balancer delay. Or fix the load balancer.
msg381972 - (view) Author: Ee Durbin (EWDurbin) * (Python triager) Date: 2020-11-28 00:50
I am away from my computer at the moment, but there is a direct access hostname for the buildbot host that was announced to the build server owners. Configuring that bypasses the load balancer.
msg381973 - (view) Author: Ee Durbin (EWDurbin) * (Python triager) Date: 2020-11-28 00:51
Apologies, wrong issue. I’ll have to take a closer look at this.
msg385256 - (view) Author: Petr Viktorin (petr.viktorin) * (Python committer) Date: 2021-01-19 12:57
Is there anything I can help with to move this forward?
Investigating buildbot failures continues to be very annoying.
msg385269 - (view) Author: Ee Durbin (EWDurbin) * (Python triager) Date: 2021-01-19 15:27
OK, I've confirmed that HAProxy seems to be the issue. WebSockets opened to the nginx proxy on the server or directly to the twisted server successfully remain indefinitely.

If anyone familiar with HAProxy would be interested in helping debug, the current relevant information is:

- Ubuntu 18.04.5 LTS
- HA-Proxy version 1.8.8-1ubuntu0.11 2020/06/22

Here is the current HAProxy configuration: https://gist.github.com/ewdurbin/d8a42c30a04d6cb5763431200acaecde which is generated from this salt state: https://github.com/python/psf-salt/tree/master/salt/haproxy and this pillar data: https://github.com/python/psf-salt/blob/master/pillar/base/haproxy.sls

I can provide any additional information that would be helpful to someone trying to sort out what's going on.
msg385271 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-01-19 16:29
I looked at these files.

HAProxy configuration, "defaults" section contains:

    timeout connect 5000
    timeout client  50000
    timeout server  50000

These timeouts are not overriden in "frontend main" nor in "backend buildbot-master" sections.

pillar/base/haproxy.sls seems to try to override these variables:

https://github.com/python/psf-salt/blob/d89e5ef2e86f45c1766c8b93d6e9621b0ab1bb09/pillar/base/haproxy.sls#L7-L11

I don't see these in the rendered HAProxy config, but I see "timeout tunnel 3600s" is rendered as "timeout tunnel 1d" in "backend buildbot-master".
msg385276 - (view) Author: Ee Durbin (EWDurbin) * (Python triager) Date: 2021-01-19 16:53
Apologies, I gisted a version that was from attempts to debug the timeouts. It's been updated.
msg385278 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-01-19 17:01
Ee:
> Apologies, I gisted a version that was from attempts to debug the timeouts. It's been updated.

Aaaah :-D No problem. Can you please try:

* replace "timeout client 30s" with "timeout client 1d"
* replace "timeout server 30s" with "timeout server 1d"

in buildbot-master server of the pillar config:

https://github.com/python/psf-salt/blob/d89e5ef2e86f45c1766c8b93d6e9621b0ab1bb09/pillar/base/haproxy.sls#L7-L11
msg385279 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-01-19 17:02
> "timeout server 1d"

Hum. I'm not sure if "1d" syntax is accepted. Maybe use "3600s".
msg391483 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2021-04-20 22:49
Ee, apparently this issue in the buildbot repo may be related: https://github.com/buildbot/buildbot/issues/4078

Could you investigate if we can use this on our PSF server?
msg391487 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2021-04-21 00:44
I'm going to try it our on the server but apparently there are some problems: https://github.com/buildbot/buildbot/issues/5991
msg391524 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2021-04-21 14:36
Can someone confirm if they still have this problem on the buildbot server?
msg391526 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-04-21 15:23
> Can someone confirm if they still have this problem on the buildbot server?

Oh wow, I don't have the issue anymore! Thank you very much :-) Previously, the website was barely usable :-(
History
Date User Action Args
2022-04-11 14:59:35adminsetgithub: 85867
2021-04-21 15:23:50vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg391526

stage: patch review -> resolved
2021-04-21 14:36:06pablogsalsetmessages: + msg391524
2021-04-21 00:44:38pablogsalsetmessages: + msg391487
2021-04-20 22:49:49pablogsalsetmessages: + msg391483
2021-03-21 17:20:52pitrousetnosy: + pitrou
2021-01-19 17:02:42vstinnersetmessages: + msg385279
2021-01-19 17:01:35vstinnersetmessages: + msg385278
2021-01-19 16:53:09EWDurbinsetmessages: + msg385276
2021-01-19 16:29:17vstinnersetmessages: + msg385271
2021-01-19 15:27:02EWDurbinsetmessages: + msg385269
2021-01-19 12:59:10petr.viktorinsetkeywords: + patch
stage: patch review
pull_requests: + pull_request23077
2021-01-19 12:57:55petr.viktorinsetnosy: + petr.viktorin
messages: + msg385256
2020-11-28 00:51:25EWDurbinsetmessages: + msg381973
2020-11-28 00:50:32EWDurbinsetmessages: + msg381972
2020-11-27 21:01:25vstinnersetmessages: + msg381967
2020-11-27 20:52:59db3lsetnosy: + db3l
messages: + msg381966
2020-09-03 13:58:44zach.waresetnosy: + EWDurbin
2020-09-03 10:21:32vstinnersetnosy: + zach.ware, pablogsal
2020-09-03 10:21:19vstinnercreate