This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author mmokrejs
Recipients amaury.forgeotdarc, mmokrejs, neologix, pitrou, sjt, skrah, tim.peters, vstinner
Date 2013-12-01.20:20:37
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1385929237.81.0.587169208411.issue18843@psf.upfronthosting.co.za>
In-reply-to
Content
Hi,
  I think I should report back what I found on the hardware side. While memory testing tools like memtest86+ and other did not find any error, the built in Dell ePSA test suite likely does compute a checksum of tested memory regions. It reported some addresses/regions as failed, sadly nobody seems to know details of the failing tests. On repeated testing different memory regions were reported, so I never understood whether that is a bad CPU cache or something randomizing the issue observed. At least, only one of the two SO-DIMMs caused the problems so lets conclude it was partly baked up and failing randomly. At that time it seemed the cause was either bad CPU producing just too much heat or the fan. Fan was replaced, max temps went down from 92 oC to 82 oC. Two months later I faced more and more often that an external HDMI-connected display did not receive signal, so even the CPU got replaced. I got another drop in max temperatures, now max are about 70 oC. Cool!

Back to python, the random crashes of my apps stopped after the memory module being replaced, actually who pair was replaced. I started to dream about linux kernel making mirroring inside memory for failure resiliency but there is nothing like that available.

In summary, this lesson was hard and showed that there are no good tools to test hardware. Checksums should be used always and bits tested for fading over the time. The mirroring trick could have also uncovered a failing memory or CPU. Seems there is still way to go to a perfect computer.

Thanks to everybody for their efforts on this issue. Whether python takes something from this lesson is up to you.
History
Date User Action Args
2013-12-01 20:20:37mmokrejssetrecipients: + mmokrejs, tim.peters, amaury.forgeotdarc, pitrou, vstinner, sjt, skrah, neologix
2013-12-01 20:20:37mmokrejssetmessageid: <1385929237.81.0.587169208411.issue18843@psf.upfronthosting.co.za>
2013-12-01 20:20:37mmokrejslinkissue18843 messages
2013-12-01 20:20:37mmokrejscreate