You might assume you have a bad drive or the SATA interface/cable is bad, or the power supply is bad/weak to the drive. These are all possible issues, but definitely check your SATA cable for "twisting". It is a big issue because until the error stops or times out, your system will not boot (in my case this was the case even though the drive with the issue was not part of the OS or booting process at all).
If you run an open rig that you move around often that ha........
In short the two drives in the array were /dev/sdd and /dev/sde. The kernel sees they were unplugged and have gone down as you can see below.
mdadm caught the first one being unplugged /dev/sde and disabled the missing drive. However when the final drive that was part of the array is unplugged it didn't notice at all. Instead it complains about an IO error later for drives that the kernel knows do not exist anymore.
[45817.162728] ata4: exception........
[ 2868.041375] ata1: EH in SWNCQ mode,QC:qc_active 0x40 sactive 0x40
[ 2868.041554] ata1: SWNCQ:qc_active 0x40 defer_bits 0x0 last_issue_tag 0x6
[ 2868.041556] dhfis 0x40 dmafis 0x40 sdbfis 0x20
[ 2868.041874] ata1: ATA_REG 0x41 ERR_REG 0x84
[ 2868.042013] ata1: tag : dhfis dmafis sdbfis sactive
[ 2868.042163] ata1: tag 0x6: 1 1 0 1
[ 2868.042301] ata1.00: exception Emask 0x1 SAct 0x40 SErr 0x400000 action 0x6 frozen
[........
[3805108.257042] sd 0:0:0:0: [sda] 1953525168 512-byte hardware sectors: (1.00 TB/931 GiB)
[3805108.257052] sd 0:0:0:0: [sda] Write Protect is off
[3805108.257054] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[3805108.257066] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[3805108.257083] sd 0:0:0:0: [sda] 1953525168 512-byte hardware sectors: (1.00 TB/931 GiB)
[3805108.257090] sd 0:0:0:0: [sda] Write Protect is off........
Another new drive bad from the start:
Jun 2 15:14:18 one-desktop kernel: [15895.386779] ata2.00: exception Emask 0x50 SAct 0x1 SErr 0x280900 action 0x6 frozen
Jun 2 15:14:18 one-desktop kernel: [15895.386782] ata2.00: irq_stat 0x08000000, interface fatal error
Jun 2 15:14:18 one-desktop kernel: [15895.386784] ata2: SError: { UnrecovData HostInt 10B8B BadCRC }
Jun 2 15:14:18 one-desktop kernel: [15895.386788] ata2.00: cmd 60/0........
This is the most I can get when plugging in a hard drive hot and only on some power connectors.
[71656.314271] ata5: exception Emask 0x50 SAct 0x0 SErr 0x90a02 action 0xe frozen
[71656.314277] ata5: irq_stat 0x00400000, PHY RDY changed
[71656.314285] ata5: SError: { RecovComm Persist HostInt PHYRdyChg 10B8B }
[71656.314294] ata5: hard resetting link
[71660.360686] ata5: softreset failed (device not ready)
[71660.360694] ata5: applying........
I like dd, although it only reads it, usually a read test of the entire disk will uncover if your hard drive is bad in some parts. This is a good thing to do at least once a month, a lot of times bizarre program behavior, laginess and crashing/unnmounting problems etc.. are due to a failing disc and SMART won't know it or indicate a problem:
We must also remember there's never a guarantee, I've found that ever since we moved to larger and more platters per drive with 1TB drives........
I had one of these shipped and it was not recognized when plugged in, here's what a dead drive looks like (I assume it's teh circuit board which is dead):
ata1: link is slow to respond, please be patient (ready=0)
ata1: softreset failed (device not ready)
ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata1: link online but device misclassified, retrying
ata1: link is slow to respond, please be patient (ready=0)
ata1: softreset f........
[137392.910057] ata4.00: exception Emask 0x0 SAct 0x1 SErr 0x80000 action 0x6 frozen
[137392.910077] ata4: SError: { 10B8B }
[137392.910095] ata4.00: cmd 60/20:00:00:00:00/00:00:00:00:00/40 tag 0 ncq 16384 in
[137392.910099] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[137392.910122] ata4.00: status: { DRDY }
[137392.910135] ata4: hard resetting link
[137393.440060] ata4: SATA link........
I separated the 2 drives in the RAID 1 array.
1 is the old one /dev/sda and is out of date, while the separated other one /dev/sdc was in another drive and mounted and used with more data (updated).
I wonder how mdadm will handle this:
usb-storage: device scan complete
md: md127 stopped.
md: bind
md: md127: raid array is not clean -- starting background reconstruction
raid1: raid set md127 active with 1 out of 2 m........
From the package "parted" you can use the command "partprobe" to re-read the partition table. I really hate rebooting, and that's what Iloved to hear about AHCI motherboards, that they allow hotswap so you don't have to reboot. But that's only as good as the OS, if the OS does not reload the partition table you won't be able to do anything with that new drive you attached without rebooting. Yes, even without re-reading the partiton table Linux will........