Before getting into the output here is my typical experience with SMART, there is what I call a "bad disk" with pending and uncorrectable sectors that cannot be reallocated.
It has caused a kernel panic and system crash repeatedly as we can see from the logs.
But SMART says it has "PASSED" its self assessment. SMART is still useful to me but it is more about looking at Current_Pending_Sector.
Any time I have had anything but 0 for that attribute it........
This is a 8TB Seagate external USB 3.0 device apparently newer kernels use a module called "UAS" instead of "USB Storage" which causes issues as a lot of devices are not properly supported in UAS mode by the kernel driver. The solution some say is to disable UAS specifically for your USB device but I'd rather just disable UAS altogether.
Solution blacklist UAS: *do not do this it does not work and just causes your USB 3.0........
This was a surprising bug but I unplugged all drives for an array md127. At first it was just 1 drive and mdadm seemed to notice this. I unplugged the second drive taking the array offline but mdadm did not realize it was offline and still showed a non-existent disk as being part of it. This created problems trying to unmount it or even to stop this array with mdadm freezing.
As for how to fix it I can only think of making sure you are not in a mounted path of........
Tired of checking iotop and seeing that your drbd partition is using 99.99% of io all the time and finding your drbd device performs slow in general?
This is especially an issue in versions of DRBD in the 8.3 tree in particular one documented case is on "8.3.13" but it likely applies to other devices.
The symptoms are that resyncing is fine and normal but any reasonable amount of activity is very slow and lagged and creates a high server load and con........
I have not found the source of this but essentially it seems like drbd and ext4 may not play well but I have to confirm still.
In either case an older DRBD setup with older hard drives seems to have little to no iowait, but the main difference is the drbd partition is ext3 and not ext4. I will experiment and see if that fixes this, then we will know that DRBD and ext4 have issues.........
This booting error is because the Xen PV guest image uses the Xen kernel, this is not compatible with anything but a host running a Xen kernel.
I did a kpartx -av virtual.img and then it created some partitions that showed up in fdisk.
I mounted it and did a chroot into it and removed the xen kernel and installed a normal kernel but Xen still shows the same kernel in Grub (only the Xen one).
This is strange but it seems like this Xen PV guest has some sort of hidden or........
These were caused by a bad stick of Corsair RAM
[] free_hot_cold_page+0xfc/0x150
[] __pagevec_free+0x14/0x1a
[] release_pages+0x127/0x12f
[] __pagevec_release+0x15/0x1d
[] __invalid_mapping_pages+0x120/0x156
[........
I like dd, although it only reads it, usually a read test of the entire disk will uncover if your hard drive is bad in some parts. This is a good thing to do at least once a month, a lot of times bizarre program behavior, laginess and crashing/unnmounting problems etc.. are due to a failing disc and SMART won't know it or indicate a problem:
We must also remember there's never a guarantee, I've found that ever since we moved to larger and more platters per drive with 1TB drives........
high IO wait
424 root 39 19 1900 848 552 D 0.0 0.0 0:00.91 updatedb
root 424 0.0 0.0 1900 848 ? DN Mar11 0:00 /usr/bin/updatedb -f sysfs?rootfs?bdev?proc?cpuset?binfmt_misc?debugfs?sockfs?usbfs?pipefs?anon_inodefs?futexfs?tmpfs?inotifyfs?eventp........
Jan 16 04:02:03 centosbox syslogd 1.4.1: restart.
Jan 16 04:07:34 centosbox kernel: INFO: task updatedb:20771 blocked for more than 300 seconds.
Jan 16 04:07:34 centosbox kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 16 04:07:34 centosbox kernel: updatedb D F78BE050 6476 20771 20766&n........
This is obviously a bug in the r8169 kernel module and it seems to affect a lot of people. I upgraded to the latest kernel and hope this won't happen anymore, as it is a very serious error. This is especially serious for those who are running servers with this chipset, who can afford for the NIC to randomly go off-line for no apparent reason?
[655548.189113] type=1505 audit(1277067560.902:5): operation="profile_load" name="/usr/bin/freshclam&q........
When trying to even cd or ls the mounted OCFS2 partition it crashes. Ithink this is a combination of VMWare Server's problem and the way I mounted and symlinked to it.
More than anything this shows the problem and lack of forsight with VMWare, but also that OCFS2 is easily crashed if you do strange things.
Output of /var/log/messages for OCFS2
Apr 10 15:57:45 localhost kernel: [84331.691258] Modules linked in: vmnet vmci vmmon ocfs2_stac........