Disable all power management options, C-states, and don’t forget to try those options that are not even present on the desktop boards? Sure thing.
Reboot every day when the display freezes? No problem.
Have the CPU perform all network functions? Why not.
These solutions might be okay for casual users, but for a cluster it makes the hardware unusable.
These “new” systems are a few years old now… hardware support should be good for this kit, but every bug is marked as WONTFIX, offering silly workarounds (as above) and met with the question “Why are you using old stuff?”.
Part of the blame is on the vendors like Stone and Viglen for producing bad hardware, but the driver support for these common as muck Intel chipsets is bad too.
Those with the knowledge to fix it have no desire, and it’s not for a lack of volunteers with hardware and a willingness to test. Maybe we must accept that this is the Linux way.