Heatsinks are overrated | Created: 24.09.2009 00:58 |
One of our cluster nodes at work crashed during the weekend.
After a reset, it came up again, but crashed again within a minute.
Another reset, and that time it didn't even get through the BIOS
before crashing. For every crash, the management card logged
either CPU0 IERR asserted or CPU1 IERR asserted. As we thought it was unlikely that both CPUs died at the same time, we assumed the mainboard to be bad. We opened a support case and waited for a technician to show up to change the board. However, after we opened the case of the machine, we saw this: ![]() Yup, those are the heatsinks on the CPUs, and they just were not screwed in. The amusing thing is that this machine is almost three years old, and it came factory integrated, i.e. fully assembled - we unpacked it, put it into the rack, and haven't moved or opened it for almost three years now. The machine has been running without the heatsink screwed on all the time. Apparently, the heat-conductive paste was enough to keep it in place in the beginning. With the paste getting drier and drier over time, the heatsink lost contact at some point - judging from the crashes, that happened last Sunday night... |
|
no comments yet write a new comment:
|