Occasionally my whole screen locks up and I cannot even swith to the console and I find this in my syslog:
description: VGA compatible controller
product: Mullins [Radeon R3 Graphics]
vendor: Advanced Micro Devices, Inc. [AMD/ATI]
This is the closest way to disabling it without using the "libata.force=noncq" kernel boot option is to set the queue to a depth of 1 which doesn't actually disable it.
Change the sdc below to match the device you want to disable NCQ for.
[root@officebox ~]# echo "1" > /sys/block/sdc/device/queue_depth
Errors that indicate you are having a performance issue are these in messages or dmesg relating to N........
I tried to stop a qemu-img copy or clone and it broke everything. It was fine to "stop" it from the GUI but a process still persisted so I killed the relevant qemu-img and the kernel went crazy. It also may not have helped that I tried to lvremove a different volume (an unused disk). But either way it breaks LVM (you cannot even run lvdisplay) so a reboot is necessary.
Jan 17 06:45:21 testserver kernel: [ 5680.439337] systemd-udevd D 0&nbs........
Before getting into the output here is my typical experience with SMART, there is what I call a "bad disk" with pending and uncorrectable sectors that cannot be reallocated.
It has caused a kernel panic and system crash repeatedly as we can see from the logs.
But SMART says it has "PASSED" its self assessment. SMART is still useful to me but it is more about looking at Current_Pending_Sector.
Any time I have had anything but 0 for that attribute it........
You can search for this bug and it seems like it may be related to ecryptfs and is many years old.
The symptoms are that you return to the computer and the screensaver was active or the screen was asleep/black and it doesn't seem to come back. But you check by SSH the computer is running fine and are frustrated you'll lose your running programs and have to reboot.
There is a simple solution:
Ctrl + Alt + F1
Ctrl +Alt + F8
It looks like this has something to do with APIC but I am not sure. I have similar CPUs with a different MB and BIOS that work fine on the same type of kernel. A lot of time the issue is because of the C-step setting in the BIOS.
The same thing happened on the 2.6 kernel with Centos 6 but this is a homebrew 4.4 kernel soI am not sure why it is happening when even Centos 7 (3.2) kernel works OK.
Solution - It comes down to the BIOS set........
Essentially a program I was running for mining did not terminate properly with Ctrl+C it is listed as defunct and cannot be killed, kernel is tainted and normal tricks to disable the port are impossible the dev and sys entries for the device cannot be browsed or interacted with in any form without a lockup of the request. The only solution is to reboot due to the kernel taint as far as I can find so far.
[1130246.811056] INFO: task minerd:21861 blocked for more th........
This is a 8TB Seagate external USB 3.0 device apparently newer kernels use a module called "UAS" instead of "USB Storage" which causes issues as a lot of devices are not properly supported in UAS mode by the kernel driver. The solution some say is to disable UAS specifically for your USB device but I'd rather just disable UAS altogether.
Solution blacklist UAS: *do not do this it does not work and just causes your USB 3.0........
When running cudaminer once it tries to initialize the card the entire screen freezes. The computer itself is still running but the Xorg is done for, you cannot even switch to another console window and must reboot (even an mdm or Xorg restart does not help).
At first cudaminer will give you these errors:
...retry after 15 seconds
GPU #0: Geforce 210 with compute ca........
This was a surprising bug but I unplugged all drives for an array md127. At first it was just 1 drive and mdadm seemed to notice this. I unplugged the second drive taking the array offline but mdadm did not realize it was offline and still showed a non-existent disk as being part of it. This created problems trying to unmount it or even to stop this array with mdadm freezing.
As for how to fix it I can only think of making sure you are not in a mounted path of........
I created a new partition table on a newly plugged in device and it caused fdisk to hang (even force kill does not work). It also may be a bad drive or some other issue because fdisk -l hangs after the first 2 HDDs (totaly of 8 HDDs on this system):
[1232879.903596] INFO: task fdisk:27176 blocked for more than 120 seconds.
[1232879.903607] Tainted: P&nbs........
Use netstat with the -anpe option. The e option shows the inodes and I do not know if it will always work or if it was by fluke but I was dealing with dozens of SSHsessions and needed to know which session was related to which forward (the PIDs of the SSHand SSHD did not match etc...)
Notice the "59560675" and "59560762" those are almost identical, if you find two sets that are nearly identical except for the last 3 digits they may match (in my ca........
If the below is happening on KVM (a very weird and scary looking error) it's probably because of Windows. This has happened countless times to me where the bootsector on Windows 7/2008 becomes corrupted easily (even by a crash or shutdown).
KVM: unknown exit, hardware reason 0x80000021
kvm_run returned -22
rax 0000000000000010 rbx 0000000000000080 rcx 0000000000000000 rdx 0000000000000080
rsi 000000000025db2a rdi 000000000007db2a rsp 0000000000000200 rbp........
I used the matching 8.3.13 utilities and it didn't work but strangely the newer 8.3.16 which makes DRBD complain works just fine.
GIT-hash: 83ca112086600faacab2f157bc5a9324f7bd7f77 build by root@sighted, 2012-10-09 12:47:51
0: cs:SyncSource ro:Secondary/Primary ds:UpToDate/Inconsistent A r-----
ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:5236960
Sep 26 16:56:21 box kernel: 00 00 00 00 00 00 00 00
Sep 26 16:56:21 box kernel: [37007.155690] d_alias libdl-2.12.so d_count=9 d_flags=8
Sep 26 16:56:21 box kernel: [37007.155697] 09 00 00 00 08 00 00 00 9f 05 9f 05 00 00 00 00 c0 71 1d 18 04 88 ff ff 00 00 00 00 00 00 00 00 a0 7e 48 00 00 c9 ff ff 78 a9 21 18 04 88 ff ff 3a 7b fa 4e 0d 00 00 00 98 5c 2d 18 04 88 ff ff 18 5c 2d 18 04 88 ff ff 18 5c 2d 18 04 88 ff ff 00 01 10 00 00 00 ad de 00 02 20 00 00 00 ad de f8........
I've got one of these for testing projects from work at home and got more than I bargained for with the time I've spent on it due to the storage handing/Perc 6/i cards.
My particular model came with the following:
2U Rack Mount Server with Rails
2xOpteron 2373 EE (Quad Core, there is a 6-core version that can be found at times)
2 x 250GB Seagate SATA
2 x Dell Perc 6/i (horrible and a nightmare to work........
On occassion and from a variety of networks and clients, Sent messages don't get saved.
I'm wondering if these log messages could be why:
May 3 14:16:39 mail.box postfix/smtpd: connect from 192.168.1.58
May 3 14:16:39 mail.box postfix/smtpd: SSL_accept error from 192.168.1.58: -1
May 3 14:16:39 mail.box postfix/smtpd: lost connection after CONNECT from 192.168.1.58
May 3 14:16:39 mail.box postfix/smtpd:........
Kernel panic - not syncing: Attempted to kill init!
Pid: 1,comm: init Tained: G I------------- 2.6.32-358.el6.x86_64 #1
 ? panic+0xa0/0x16f
 ? do_exit+0x862/0x870
 ? fput+0x25/0x30
 ? do_group_exit+0x58/0xd0
 ? sys_exit_........
I like dd, although it only reads it, usually a read test of the entire disk will uncover if your hard drive is bad in some parts. This is a good thing to do at least once a month, a lot of times bizarre program behavior, laginess and crashing/unnmounting problems etc.. are due to a failing disc and SMART won't know it or indicate a problem:
We must also remember there's never a guarantee, I've found that ever since we moved to larger and more platters per drive with 1TB drives........
This happened during a RAID array check:
SMART says both drives pass the test, but I'm doing a long test on them and hopefully this is not a hardware error.
Apr 3 04:22:01 remote kernel: md: syncing RAID array md2
Apr 3 04:22:01 remote kernel: md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
Apr 3 04:22:01 remote kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reconstruction.
Jan 5 12:45:05 testbox kernel: [653298.890004] BUG: soft lockup - CPU#0 stuck for 61s! [hal-acl-tool:4168]
Jan 5 12:45:05 testbox kernel: [653298.890005] Modules linked in: vmnet vmci vmmon binfmt_misc drbd video output input_polldev ocfs2_stackglue ocfs2_dlmfs ocfs2_dlm ocfs2_nodemanager configfs k8temp hwmon_vid lp snd_hda_intel snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_dummy snd_seq_oss snd_seq_midi snd_rawmidi........
This drive is clearly on the way out, the Kernel knows it but I'm surprised that SMART is not concerned. I didn't blame Seagate for their past issues until now. This hard drive has hardly been used and has not even been powered on for a year according to SMART.
Home page is http://smartmontools.sourceforge.net/
=== START OF INFORMATION SECTION ===
Model Family: Seagate Barracuda 7200.11
Centos 4.3 x64 & VMWare Server Beta[code:1:6d0b2c8c2f]
The correct version of one or more libraries needed to run VMware Server may be
missing. This is the output of ldd /usr/bin/vmware:
linux-gate.so.1 => (0xffffe000)
libm.so.6 => /lib/tls/libm.so.6 (0xf7fbd000)
libdl.so.2 => /lib/libdl.so.2 (0xf7fb9000)
libpthread.so.0 => /lib/tls/libpthread.so.0 (0xf7fa7000)
libX11.so.6 => not f........
When trying to even cd or ls the mounted OCFS2 partition it crashes. Ithink this is a combination of VMWare Server's problem and the way I mounted and symlinked to it.
More than anything this shows the problem and lack of forsight with VMWare, but also that OCFS2 is easily crashed if you do strange things.
Output of /var/log/messages for OCFS2
Apr 10 15:57:45 localhost kernel: [84331.691258] Modules linked in: vmnet vmci vmmon ocfs2_stac........