Author Phil Frost
Recipients Phil Frost, facundobatista, mark.dickinson, rhettinger, skrah, tim.peters, vstinner
Date 2019-06-06.16:31:34
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <>
skrah: Yes, that's correct. Since I can only produce this bug in production it will take me some days to build and validate a source build. But absent any better ideas, I will try.

tim.peters: I've observed this bug across hundreds of EC2 hosts, in dozens of code paths, with all kinds of inputs. Moreover, the hosts aren't displaying any other symptoms of hardware failure such as random segfaults or mysteriously corrupted data.

I've also deeply investigated two cores now which show specifically that `exp` seems to get 2 added when it should have been 1. I have a hard time explaining how a hardware failure can cause precisely the same failure so reliably.

So I doubt hardware is to blame.

Although, it does seem the issue occurs in "clumps" on individual hosts. So we might go 10 hours without seeing the issue, then it may happen 5 times within 30 minutes on one host. We might observe 1 or 2 more such clumps on the same host until the next deploy of the application, at which point all the containers are replaced with fresh ones. So this suggests there is some ephemeral state within a host that creates a propensity for the issue.

I've also been unable to reproduce the problem in a development environment, even when that development environment is using the same kernel, instance class, and docker container as production. So I suspect the bug is precipitated by some particular concurrency or interaction that I haven't been able to replicate.
Date User Action Args
2019-06-06 16:31:34Phil Frostsetrecipients: + Phil Frost, tim.peters, rhettinger, facundobatista, mark.dickinson, vstinner, skrah
2019-06-06 16:31:34Phil Frostsetmessageid: <>
2019-06-06 16:31:34Phil Frostlinkissue37168 messages
2019-06-06 16:31:34Phil Frostcreate