Using stress, lm-sensors, and hddtemp to sort out temperature and reliability related issues with a home-based NAS box.
I recently found myself in a difficult situation with my home NAS brought about by some sketchy construction work done at my apartment. Long story short, the workers didn’t mention that they would be using my work area as their work area and set about cutting bathroom tiles right next to my main workstation, my home NAS server, and a toy i3-based rig that I occasionally use for testing and special projects. Consequently, my NAS suffered failure after failure – with ghosts that I am still chasing more than three months later.
My NAS is a home-built rig using a Supermicro X8STi with an X5680, 24GB of ECC DDR3, and six disks total, including the operating system (OS) drive itself. The OS is on an SSD, and the data is all on HDDs. There is no redundancy (see my article on MergerFS elsewhere in this issue). So, to be clear, this is a mess primarily of my own making, but one not without others at fault.
The first problem that I experienced after the aforementioned construction was that my disks would randomly disappear after having been present at boot. I cleaned the machine out as best as I could and reseated the SATA cables. This worked temporarily, but two of the drives would still drop off, requiring a reboot, and in some cases, requiring me to reseat the SATA cable. I should note that, at the time, I was using old inexpensive SATA cables that had been collected over the years from motherboard purchases, and so I decided it was time to switch to more modern cables with the locks that the SATA III specification requires. The problems remained.
Use Express-Checkout link below to read the full article (PDF).