So say you happen to have 2 NICs of the exact same chipset, they will generally show up as the same name, with possibly a different revision in lspci. Normally this is not an issue if you have a server with 4 NICs, generally the eth0 to eth3 appears from left to the right (or right to left on some vendors) so it doesn't take much figuring out.
Generally if you have different chipsets for different NICs, it should be easy to know which one is eth0 or the first NIC in the OS.........
You might assume you have a bad drive or the SATA interface/cable is bad, or the power supply is bad/weak to the drive. These are all possible issues, but definitely check your SATA cable for "twisting". It is a big issue because until the error stops or times out, your system will not boot (in my case this was the case even though the drive with the issue was not part of the OS or booting process at all).
If you run an open rig that you move around often that ha........
Occasionally my whole screen locks up and I cannot even swith to the console and I find this in my syslog:
*-display
description: VGA compatible controller
product: Mullins [Radeon R3 Graphics]
vendor: Advanced Micro Devices, Inc. [AMD/ATI]
 ........
sudo apt-get install hwloc-nox
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following NEW packages will be installed:
hwloc-nox
0 upgraded, 1 newly installed, 0 to remove and 530 not upgraded.
Need to get 151 kB of archives.
After this operation, 453 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubunt........
[root@localhost:~]
BootModuleConfig.sh echo host-ind nfcd........
It looks like this has something to do with APIC but I am not sure. I have similar CPUs with a different MB and BIOS that work fine on the same type of kernel. A lot of time the issue is because of the C-step setting in the BIOS.
The same thing happened on the 2.6 kernel with Centos 6 but this is a homebrew 4.4 kernel soI am not sure why it is happening when even Centos 7 (3.2) kernel works OK.
Solution - It comes down to the BIOS set........
Normally lspci will show you just like this and would suggest they are exactly the same card:
1a:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480/570/580] (rev e7)
1c:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480/570/580] (rev e7)
lspci -vnn is the answer
As we can see one is a Gigabyte and the other is an MSI card. Wha........
forcedeth 0000:00:08.0: irq 25 for MSI/MSI-X
forcedeth 0000:00:08.0: eth0: MSI enabled
forcedeth 0000:00:08.0: eth0: no link during initialization
ADDRCONF(NETDEV_UP): eth0: link is not ready
forcedeth 0000:00:08.0: eth0: link up
ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
Dec 1 18:21:32 box15 kernel: forcedeth: Reverse Engineered nForce ethernet driver. Version 0.64.
Dec 1 18:21:32 box15 kernel........
At first my BIOS said the card may not work right because there is no more option ROM space.
I disabled the Option ROM for both LSI 1068 and 2008 chipsets, Network Boot ROM and most other PCI slots, Serial Port, etc... and the message went away but the card still does not work properly.
But it still cannot initialize the card properly(does not work):
[ 33.943272] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:........
Iused the slightly older "304.117" version and it worked.
With the newest version Icouldn't get X to start and kept getting these errors in messages/dmesg.
[ 2346.083660] nvidia 0000:01:00.0: irq 44 for MSI/MSI-X
[ 2350.608342] NVRM: RmInitAdapter failed! (0x12:0x2b:1831)
[ 2350.608354] NVRM: rm_init_adapter failed for device bearing minor number 0
[ 2350.608369] NVRM: nvidia_frontend_open: minor 0, module->open() failed, err........
[3805108.257042] sd 0:0:0:0: [sda] 1953525168 512-byte hardware sectors: (1.00 TB/931 GiB)
[3805108.257052] sd 0:0:0:0: [sda] Write Protect is off
[3805108.257054] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[3805108.257066] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[3805108.257083] sd 0:0:0:0: [sda] 1953525168 512-byte hardware sectors: (1.00 TB/931 GiB)
[3805108.257090] sd 0:0:0:0: [sda] Write Protect is off........
pxe-32 tftp open timeout
The solution was to enable tftp in xinetd with "chkconfig tftp on".
See the troubleshooting below:
chkconfig --list
NetworkManager 0:off 1:off 2:off 3:off 4:off 5:off 6:off
acpid 0:off&n........
Another new drive bad from the start:
Jun 2 15:14:18 one-desktop kernel: [15895.386779] ata2.00: exception Emask 0x50 SAct 0x1 SErr 0x280900 action 0x6 frozen
Jun 2 15:14:18 one-desktop kernel: [15895.386782] ata2.00: irq_stat 0x08000000, interface fatal error
Jun 2 15:14:18 one-desktop kernel: [15895.386784] ata2: SError: { UnrecovData HostInt 10B8B BadCRC }
Jun 2 15:14:18 one-desktop kernel: [15895.386788] ata2.00: cmd 60/0........
This is just trying to read 5GB off the drive with dd and the drive initially tested ok but shortly after I wondered why I was seeing 2MB/s read speeds. Notice the "current_pending_sector", anytime I've seen it at anything above 0 even with no other bad fields/attributes, it means the drive is bad.
ata1.00: exception Emask 0x0 SAct 0x3 SErr 0x0 action 0x0
ata1.00: irq_stat 0x40000008
ata1.00: failed command: READ FPDMA QUEUED
ata1.00: cmd 60/00:00:........
This is the most I can get when plugging in a hard drive hot and only on some power connectors.
[71656.314271] ata5: exception Emask 0x50 SAct 0x0 SErr 0x90a02 action 0xe frozen
[71656.314277] ata5: irq_stat 0x00400000, PHY RDY changed
[71656.314285] ata5: SError: { RecovComm Persist HostInt PHYRdyChg 10B8B }
[71656.314294] ata5: hard resetting link
[71660.360686] ata5: softreset failed (device not ready)
[71660.360694] ata5: applying........
I like dd, although it only reads it, usually a read test of the entire disk will uncover if your hard drive is bad in some parts. This is a good thing to do at least once a month, a lot of times bizarre program behavior, laginess and crashing/unnmounting problems etc.. are due to a failing disc and SMART won't know it or indicate a problem:
We must also remember there's never a guarantee, I've found that ever since we moved to larger and more platters per drive with 1TB drives........
This happened during a RAID array check:
SMART says both drives pass the test, but I'm doing a long test on them and hopefully this is not a hardware error.
Apr 3 04:22:01 remote kernel: md: syncing RAID array md2
Apr 3 04:22:01 remote kernel: md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
Apr 3 04:22:01 remote kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reconstruction.
Apr........
kernel 2.6.27.54
Fusion MPT base driver 3.04.07
Copyright (c) 1999-2008 LSI Corporation
Fusion MPT SPI Host driver 3.04.07
mptbase: ioc0: Initiating bringup
mptbase: ioc0: WARNING - Unexpected doorbell active!
mptbase: ioc0: ERROR - Doorbell ACK timeout (count=4999), IntStatus=80000001!
mptbase: ioc0: ERROR - Diagnostic reset FAILED! (102h)
mptbase: ioc0: WARNING - NOT READY!
mptbase: ioc0: ERROR - didn't initialize proper........
I think this will be useful to others because I have a server that kept crashing mysteriously during intense disk usage/RAID checks. It would only crash during the weekly RAID integrity check.
ThenI noticed during a reboot that not all CPUs were being brought up, as a result this actually creates much higher temperatures with the output I got from sensors, just booting the system produced higher than normal temperatures.
You can imagine that a full blown RAID check........
CPU/Kernel/MB/RAID problem?
Jan 5 12:45:05 testbox kernel: [653298.890004] BUG: soft lockup - CPU#0 stuck for 61s! [hal-acl-tool:4168]
Jan 5 12:45:05 testbox kernel: [653298.890005] Modules linked in: vmnet vmci vmmon binfmt_misc drbd video output input_polldev ocfs2_stackglue ocfs2_dlmfs ocfs2_dlm ocfs2_nodemanager configfs k8temp hwmon_vid lp snd_hda_intel snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_dummy snd_seq_oss snd_seq_midi snd_rawmidi........
I separated the 2 drives in the RAID 1 array.
1 is the old one /dev/sda and is out of date, while the separated other one /dev/sdc was in another drive and mounted and used with more data (updated).
I wonder how mdadm will handle this:
usb-storage: device scan complete
md: md127 stopped.
md: bind
md: md127: raid array is not clean -- starting background reconstruction
raid1: raid set md127 active with 1 out of 2 m........
You'll see the following and the boot process will freeze:
io scheduler noop registered
io scheduler anticipatory registered
io scheduler deadline registered
io scheduler cfq registered (default)
I have struggled with this issue on vari........
NET: Registered protocol family 2
The above is the last thing that I ever saw, I tried pci=routeirq etc.. and it wouldn't work.
The solution is to enable IOAPIC in the VBOX Settings
Just enable "IOAPIC" in the settings for your Centos Guest and you'll find the kernel boots just fine. I wonder if a physical system might stall in this same way if the BIOS has IOAPIC disabled which many people do as a troubleshooting method.
........
[27969.398749] sd 5:0:0:0: [sdb] 3907029168 512-byte hardware sectors (2000399 MB)
[27969.398749] sd 5:0:0:0: [sdb] Write Protect is off
[27969.398749] sd 5:0:0:0: [sdb] Mode Sense: 00 3a 00 00
[27969.398749] sd 5:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[27972.117543] ata6.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0
[27972.117543] ata6.00: irq_stat 0x48000000
[27972.117543] ata6.00: cmd 60/08:00:ff:7........
This is obviously a bug in the r8169 kernel module and it seems to affect a lot of people. I upgraded to the latest kernel and hope this won't happen anymore, as it is a very serious error. This is especially serious for those who are running servers with this chipset, who can afford for the NIC to randomly go off-line for no apparent reason?
[655548.189113] type=1505 audit(1277067560.902:5): operation="profile_load" name="/usr/bin/freshclam&q........
From the package "parted" you can use the command "partprobe" to re-read the partition table. I really hate rebooting, and that's what Iloved to hear about AHCI motherboards, that they allow hotswap so you don't have to reboot. But that's only as good as the OS, if the OS does not reload the partition table you won't be able to do anything with that new drive you attached without rebooting. Yes, even without re-reading the partiton table Linux will........
When trying to even cd or ls the mounted OCFS2 partition it crashes. Ithink this is a combination of VMWare Server's problem and the way I mounted and symlinked to it.
More than anything this shows the problem and lack of forsight with VMWare, but also that OCFS2 is easily crashed if you do strange things.
Output of /var/log/messages for OCFS2
Apr 10 15:57:45 localhost kernel: [84331.691258] Modules linked in: vmnet vmci vmmon ocfs2_stac........