In short the two drives in the array were /dev/sdd and /dev/sde. The kernel sees they were unplugged and have gone down as you can see below.
mdadm caught the first one being unplugged /dev/sde and disabled the missing drive. However when the final drive that was part of the array is unplugged it didn't notice at all. Instead it complains about an IO error later for drives that the kernel knows do not exist anymore.
[45817.162728] ata4: exception........
Have you ever unplugged the wrong drive and then had to rebuild the entire array? It may not be a big deal in some ways but it does make your system vulnerable until the rebuild is done.
Many distros often enable the "bitmap" feature and this basically keeps track of what parts need to be resynced in the case of a temporary removal of a drive from the array, this way it only needs to sync what has changed.
To enable bitmap to speed up rebuilds and sync........
This happened during a RAID array check:
SMART says both drives pass the test, but I'm doing a long test on them and hopefully this is not a hardware error.
Apr 3 04:22:01 remote kernel: md: syncing RAID array md2
Apr 3 04:22:01 remote kernel: md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
Apr 3 04:22:01 remote kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reconstruction.
I think this will be useful to others because I have a server that kept crashing mysteriously during intense disk usage/RAID checks. It would only crash during the weekly RAID integrity check.
ThenI noticed during a reboot that not all CPUs were being brought up, as a result this actually creates much higher temperatures with the output I got from sensors, just booting the system produced higher than normal temperatures.
You can imagine that a full blown RAID check........
Jan 16 04:02:03 centosbox syslogd 1.4.1: restart.
Jan 16 04:07:34 centosbox kernel: INFO: task updatedb:20771 blocked for more than 300 seconds.
Jan 16 04:07:34 centosbox kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 16 04:07:34 centosbox kernel: updatedb D F78BE050 6476 20771 20766&n........