I think this will be useful to others because I have a server that kept crashing mysteriously during intense disk usage/RAID checks. It would only crash during the weekly RAID integrity check.
Then I noticed during a reboot that not all CPUs were being brought up, as a result this actually creates much higher temperatures with the output I got from sensors, just booting the system produced higher than normal temperatures.
You can imagine that a full blown RAID check would create a lot more strain, and probably resulted in the crashes I've been seeing. I've since upgraded the kernel and rebooted and did 3 manual RAID checks and the system has not crashed.
The real question comes down to this, was this a hardware issue or a kernel issue? I guess time will tell.
CPU0: AMD Athlon(tm) II X4 620 Processor stepping 02
Booting processor 1/1 eip 10000
spurious 8259A interrupt: IRQ7.
Not responding.
Inquiring remote APIC #1...
... APIC #1 ID: 01000000
... APIC #1 VERSION: 80050010
... APIC #1 SPIV: 000000ff
CPU #1 not responding - cannot use it.
Booting processor 1/2 eip 10000
Not responding.
Inquiring remote APIC #2...
... APIC #2 ID: 02000000
... APIC #2 VERSION: 80050010
... APIC #2 SPIV: 000000ff
CPU #2 not responding - cannot use it.
Booting processor 1/3 eip 10000
Not responding.
Inquiring remote APIC #3...
... APIC #3 ID: 03000000
... APIC #3 VERSION: 80050010
... APIC #3 SPIV: 000000ff
CPU #3 not responding - cannot use it.
Total of 1 processors activated (5223.99 BogoMIPS).
ENABLING IO-APIC IRQs
..TIMER: vector=0x31 apic1=0 pin1=2 apic2=-1 pin2=-1
Using local APIC timer interrupts.
Brought up 1 CPUs
zapping low mappings.
sizeof(vma)=88 bytes
sizeof(page)=36 bytes
sizeof(inode)=364 bytes
sizeof(dentry)=148 bytes
sizeof(ext3inode)=516 bytes
sizeof(buffer_head)=52 bytes
sizeof(skbuff)=192 bytes
checking if image is initramfs... it is
Freeing initrd memory: (37d6c000-37fef7fd) 2573k freed
md: delaying resync of md1 until md0 has finished resync (they share one or more physical units)
INFO: task md1_resync:9004 blocked for more than 300 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
md1_resync D F561F910 7396 9004 6 6410 (L-TLB)
f236dec0 00000046 6eaa8e40 000003e7 00000000 00000005 f561f910 6eaa8e40
000003e7 00000000 978f1a6c 00000a6d f561fa34 c07ba980 978f2241 00000a6d
00000000 f7040d80 f7d30e00 c0423845 c0671719 c07ba980 f783958c c04238f5
Call Trace:
[<c0423845>] vprintk+0x26/0x3a
[<c04238f5>] printk+0x18/0x8e
[<c05ab14e>] md_do_sync+0x1fe/0x966
[<c041923e>] enqueue_task+0x2f/0x3f
[<c04195cf>] __activate_task+0x83/0x147
[<c04191d5>] dequeue_task+0x13/0x26
[<c0623776>] schedule+0xd1a/0xe03
[<c0435f23>] autoremove_wake_function+0x0/0x2d
[<c05abba7>] md_thread+0xe6/0xfc
[<c0418fd1>] complete+0x2b/0x3d
[<c05abac1>] md_thread+0x0/0xfc
[<c0435e61>] kthread+0xc0/0xeb
[<c0435da1>] kthread+0x0/0xeb
[<c0627f2f>] kernel_thread_helper+0x7/0x10
=======================
INFO: task md1_resync:9004 blocked for more than 300 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
md1_resync D F561F910 7396 9004 6 6410 (L-TLB)
f236dec0 00000046 6eaa8e40 000003e7 00000000 00000005 f561f910 6eaa8e40
000003e7 00000000 978f1a6c 00000a6d f561fa34 c07ba980 978f2241 00000a6d
00000000 f7040d80 f7d30e00 c0423845 c0671719 c07ba980 f783958c c04238f5
Call Trace:
[<c0423845>] vprintk+0x26/0x3a
[<c04238f5>] printk+0x18/0x8e
[<c05ab14e>] md_do_sync+0x1fe/0x966
[<c041923e>] enqueue_task+0x2f/0x3f
[<c04195cf>] __activate_task+0x83/0x147
[<c04191d5>] dequeue_task+0x13/0x26
[<c0623776>] schedule+0xd1a/0xe03
[<c0435f23>] autoremove_wake_function+0x0/0x2d
[<c05abba7>] md_thread+0xe6/0xfc
[<c0418fd1>] complete+0x2b/0x3d
[<c05abac1>] md_thread+0x0/0xfc
[<c0435e61>] kthread+0xc0/0xeb
[<c0435da1>] kthread+0x0/0xeb
[<c0627f2f>] kernel_thread_helper+0x7/0x10
=======================
INFO: task md1_resync:9004 blocked for more than 300 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
md1_resync D F561F910 7396 9004 6 6410 (L-TLB)
f236dec0 00000046 6eaa8e40 000003e7 00000000 00000005 f561f910 6eaa8e40
000003e7 00000000 978f1a6c 00000a6d f561fa34 c07ba980 978f2241 00000a6d
00000000 f7040d80 f7d30e00 c0423845 c0671719 c07ba980 f783958c c04238f5
Call Trace:
[<c0423845>] vprintk+0x26/0x3a
[<c04238f5>] printk+0x18/0x8e
[<c05ab14e>] md_do_sync+0x1fe/0x966
[<c041923e>] enqueue_task+0x2f/0x3f
[<c04195cf>] __activate_task+0x83/0x147
[<c04191d5>] dequeue_task+0x13/0x26
[<c0623776>] schedule+0xd1a/0xe03
[<c0435f23>] autoremove_wake_function+0x0/0x2d
[<c05abba7>] md_thread+0xe6/0xfc
[<c0418fd1>] complete+0x2b/0x3d
[<c05abac1>] md_thread+0x0/0xfc
[<c0435e61>] kthread+0xc0/0xeb
[<c0435da1>] kthread+0x0/0xeb
[<c0627f2f>] kernel_thread_helper+0x7/0x10
=======================
INFO: task md1_resync:9004 blocked for more than 300 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
md1_resync D F561F910 7396 9004 6 6410 (L-TLB)
f236dec0 00000046 6eaa8e40 000003e7 00000000 00000005 f561f910 6eaa8e40
000003e7 00000000 978f1a6c 00000a6d f561fa34 c07ba980 978f2241 00000a6d
00000000 f7040d80 f7d30e00 c0423845 c0671719 c07ba980 f783958c c04238f5
Call Trace:
[<c0423845>] vprintk+0x26/0x3a
[<c04238f5>] printk+0x18/0x8e
[<c05ab14e>] md_do_sync+0x1fe/0x966
[<c041923e>] enqueue_task+0x2f/0x3f
[<c04195cf>] __activate_task+0x83/0x147
[<c04191d5>] dequeue_task+0x13/0x26
[<c0623776>] schedule+0xd1a/0xe03
[<c0435f23>] autoremove_wake_function+0x0/0x2d
[<c05abba7>] md_thread+0xe6/0xfc
[<c0418fd1>] complete+0x2b/0x3d
[<c05abac1>] md_thread+0x0/0xfc
[<c0435e61>] kthread+0xc0/0xeb
[<c0435da1>] kthread+0x0/0xeb
[<c0627f2f>] kernel_thread_helper+0x7/0x10
=======================
INFO: task md1_resync:9004 blocked for more than 300 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
md1_resync D F561F910 7396 9004 6 6410 (L-TLB)
f236dec0 00000046 6eaa8e40 000003e7 00000000 00000005 f561f910 6eaa8e40
000003e7 00000000 978f1a6c 00000a6d f561fa34 c07ba980 978f2241 00000a6d
00000000 f7040d80 f7d30e00 c0423845 c0671719 c07ba980 f783958c c04238f5
Call Trace:
[<c0423845>] vprintk+0x26/0x3a
[<c04238f5>] printk+0x18/0x8e
[<c05ab14e>] md_do_sync+0x1fe/0x966
[<c041923e>] enqueue_task+0x2f/0x3f
[<c04195cf>] __activate_task+0x83/0x147
[<c04191d5>] dequeue_task+0x13/0x26
[<c0623776>] schedule+0xd1a/0xe03
[<c0435f23>] autoremove_wake_function+0x0/0x2d
[<c05abba7>] md_thread+0xe6/0xfc
[<c0418fd1>] complete+0x2b/0x3d
[<c05abac1>] md_thread+0x0/0xfc
[<c0435e61>] kthread+0xc0/0xeb
[<c0435da1>] kthread+0x0/0xeb
[<c0627f2f>] kernel_thread_helper+0x7/0x10
=======================
INFO: task md1_resync:9004 blocked for more than 300 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
md1_resync D F561F910 7396 9004 6 6410 (L-TLB)
f236dec0 00000046 6eaa8e40 000003e7 00000000 00000005 f561f910 6eaa8e40
000003e7 00000000 978f1a6c 00000a6d f561fa34 c07ba980 978f2241 00000a6d
00000000 f7040d80 f7d30e00 c0423845 c0671719 c07ba980 f783958c c04238f5
Call Trace:
[<c0423845>] vprintk+0x26/0x3a
[<c04238f5>] printk+0x18/0x8e
[<c05ab14e>] md_do_sync+0x1fe/0x966
[<c041923e>] enqueue_task+0x2f/0x3f
[<c04195cf>] __activate_task+0x83/0x147
[<c04191d5>] dequeue_task+0x13/0x26
[<c0623776>] schedule+0xd1a/0xe03
[<c0435f23>] autoremove_wake_function+0x0/0x2d
[<c05abba7>] md_thread+0xe6/0xfc
[<c0418fd1>] complete+0x2b/0x3d
[<c05abac1>] md_thread+0x0/0xfc
[<c0435e61>] kthread+0xc0/0xeb
[<c0435da1>] kthread+0x0/0xeb
[<c0627f2f>] kernel_thread_helper+0x7/0x10
=======================
INFO: task md1_resync:9004 blocked for more than 300 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
md1_resync D F561F910 7396 9004 6 6410 (L-TLB)
f236dec0 00000046 6eaa8e40 000003e7 00000000 00000005 f561f910 6eaa8e40
000003e7 00000000 978f1a6c 00000a6d f561fa34 c07ba980 978f2241 00000a6d
00000000 f7040d80 f7d30e00 c0423845 c0671719 c07ba980 f783958c c04238f5
Call Trace:
[<c0423845>] vprintk+0x26/0x3a
[<c04238f5>] printk+0x18/0x8e
[<c05ab14e>] md_do_sync+0x1fe/0x966
[<c041923e>] enqueue_task+0x2f/0x3f
[<c04195cf>] __activate_task+0x83/0x147
[<c04191d5>] dequeue_task+0x13/0x26
[<c0623776>] schedule+0xd1a/0xe03
[<c0435f23>] autoremove_wake_function+0x0/0x2d
[<c05abba7>] md_thread+0xe6/0xfc
[<c0418fd1>] complete+0x2b/0x3d
[<c05abac1>] md_thread+0x0/0xfc
[<c0435e61>] kthread+0xc0/0xeb
[<c0435da1>] kthread+0x0/0xeb
[<c0627f2f>] kernel_thread_helper+0x7/0x10
=======================
INFO: task md1_resync:9004 blocked for more than 300 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
md1_resync D F561F910 7396 9004 6 6410 (L-TLB)
f236dec0 00000046 6eaa8e40 000003e7 00000000 00000005 f561f910 6eaa8e40
000003e7 00000000 978f1a6c 00000a6d f561fa34 c07ba980 978f2241 00000a6d
00000000 f7040d80 f7d30e00 c0423845 c0671719 c07ba980 f783958c c04238f5
Call Trace:
[<c0423845>] vprintk+0x26/0x3a
[<c04238f5>] printk+0x18/0x8e
[<c05ab14e>] md_do_sync+0x1fe/0x966
[<c041923e>] enqueue_task+0x2f/0x3f
[<c04195cf>] __activate_task+0x83/0x147
[<c04191d5>] dequeue_task+0x13/0x26
[<c0623776>] schedule+0xd1a/0xe03
[<c0435f23>] autoremove_wake_function+0x0/0x2d
[<c05abba7>] md_thread+0xe6/0xfc
[<c0418fd1>] complete+0x2b/0x3d
[<c05abac1>] md_thread+0x0/0xfc
[<c0435e61>] kthread+0xc0/0xeb
[<c0435da1>] kthread+0x0/0xeb
[<c0627f2f>] kernel_thread_helper+0x7/0x10
=======================
INFO: task md1_resync:9004 blocked for more than 300 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
md1_resync D F561F910 7396 9004 6 6410 (L-TLB)
f236dec0 00000046 6eaa8e40 000003e7 00000000 00000005 f561f910 6eaa8e40
000003e7 00000000 978f1a6c 00000a6d f561fa34 c07ba980 978f2241 00000a6d
00000000 f7040d80 f7d30e00 c0423845 c0671719 c07ba980 f783958c c04238f5
Call Trace:
[<c0423845>] vprintk+0x26/0x3a
[<c04238f5>] printk+0x18/0x8e
[<c05ab14e>] md_do_sync+0x1fe/0x966
[<c041923e>] enqueue_task+0x2f/0x3f
[<c04195cf>] __activate_task+0x83/0x147
[<c04191d5>] dequeue_task+0x13/0x26
[<c0623776>] schedule+0xd1a/0xe03
[<c0435f23>] autoremove_wake_function+0x0/0x2d
[<c05abba7>] md_thread+0xe6/0xfc
[<c0418fd1>] complete+0x2b/0x3d
[<c05abac1>] md_thread+0x0/0xfc
[<c0435e61>] kthread+0xc0/0xeb
[<c0435da1>] kthread+0x0/0xeb
[<c0627f2f>] kernel_thread_helper+0x7/0x10
=======================
INFO: task md1_resync:9004 blocked for more than 300 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
md1_resync D F561F910 7396 9004 6 6410 (L-TLB)
f236dec0 00000046 6eaa8e40 000003e7 00000000 00000005 f561f910 6eaa8e40
000003e7 00000000 978f1a6c 00000a6d f561fa34 c07ba980 978f2241 00000a6d
00000000 f7040d80 f7d30e00 c0423845 c0671719 c07ba980 f783958c c04238f5
Call Trace:
[<c0423845>] vprintk+0x26/0x3a
[<c04238f5>] printk+0x18/0x8e
[<c05ab14e>] md_do_sync+0x1fe/0x966
[<c041923e>] enqueue_task+0x2f/0x3f
[<c04195cf>] __activate_task+0x83/0x147
[<c04191d5>] dequeue_task+0x13/0x26
[<c0623776>] schedule+0xd1a/0xe03
[<c0435f23>] autoremove_wake_function+0x0/0x2d
[<c05abba7>] md_thread+0xe6/0xfc
[<c0418fd1>] complete+0x2b/0x3d
[<c05abac1>] md_thread+0x0/0xfc
[<c0435e61>] kthread+0xc0/0xeb
[<c0435da1>] kthread+0x0/0xeb
[<c0627f2f>] kernel_thread_helper+0x7/0x10
=======================
md: md0: sync done.
md: syncing RAID array md1
md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reconstruction.
md: using 128k window, over a total of 30716160 blocks.
RAID1 conf printout:
--- wd:2 rd:2
disk 0, wo:0, o:1, dev:sda1
disk 1, wo:0, o:1, dev:sdb1
md: md1: sync done.
RAID1 conf printout:
--- wd:2 rd:2
disk 0, wo:0, o:1, dev:sda2
disk 1, wo:0, o:1, dev:sdb2
cpu, responding, server, crashing, mysteriously, disk, usage, raid, weekly, reboot, cpus, creates, temperatures, output, sensors, booting, produced, strain, resulted, crashes, ve, upgraded, kernel, rebooted, manual, hardware, amd, athlon, tm, ii, processor, stepping, eip, spurious, irq, inquiring, apic, spiv, ff, processors, activated, bogomips, enabling, io, irqs, timer, vector, interrupts, zapping, mappings, sizeof, vma, bytes, inode, dentry, ext, buffer_head, skbuff, initramfs, freeing, initrd, fef, fd, freed, md, delaying, resync, info, task, _resync, blocked, quot, echo, proc, sys, hung_task_timeout_secs, disables, tlb, dec, eaa, fa, vprintk, printk, ab, md_do_sync, fe, enqueue_task, cf, __activate_task, dequeue_task, xd, xe, autoremove_wake_function, abba, md_thread, xfc, abac, kthread, xc, xeb, kernel_thread_helper, sync, syncing, array, _guaranteed_, reconstruction, kb, disc, maximum, idle, bandwidth, conf, printout, wd, rd, wo, dev, sda, sdb,