Why SMART is not smart at all and doesn't properly predict disk errors that cause a kernel panic or crash

Before getting into the output here is my typical experience with SMART, there is what I call a "bad disk" with pending and uncorrectable sectors that cannot be reallocated.
It has caused a kernel panic and system crash repeatedly as we can see from the logs.
But SMART says it has "PASSED" its self assessment.  SMART is still useful to me but it is more about looking at Current_Pending_Sector.
Any time I have had anything but 0 for that attribute it means the disk is bad and is unusable (eg. will cause kernel panics).
In this case even RAID doesn't help when the bad disk taints the kernel.

First let's check this disk and see what SMART thinks

smartctl -a /dev/sda

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda ES
Device Model:     ST3750640NS
Serial Number:    ABCAEAAA
LU WWN Device Id: 5 000c50 0083422e5
Firmware Version: 3BKH
User Capacity:    750,156,374,016 bytes [750 GB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   7
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Thu Dec 13 12:43:37 2018 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED


ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   093   086   006    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0003   091   091   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       27
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   090   060   030    Pre-fail  Always       -       951683243
  9 Power_On_Hours          0x0032   052   052   000    Old_age   Always       -       42128
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       27
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   066   054   045    Old_age   Always       -       34 (Min/Max 28/36)
194 Temperature_Celsius     0x0022   034   046   000    Old_age   Always       -       34 (0 17 0 0 0)
195 Hardware_ECC_Recovered  0x001a   081   055   000    Old_age   Always       -       220199
197 Current_Pending_Sector  0x0012   096   096   000    Old_age   Always       -       93
198 Offline_Uncorrectable   0x0010   096   096   000    Old_age   Offline      -       93
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       971
200 Multi_Zone_Error_Rate   0x0000   100   253   000    Old_age   Offline      -       0
202 Data_Address_Mark_Errs  0x0032   100   253   000    Old_age   Always       -       0

Now let's see /var/log/messages

Dec 12 05:29:46 somepoorbox kernel: [30883839.026190] sd 0:0:0:0: [sda]  Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Dec 12 05:29:46 somepoorbox kernel: [30883839.026196] sd 0:0:0:0: [sda]  Sense Key : Medium Error [current] [descriptor] Dec 12 05:29:46 somepoorbox kernel: [30883839.026203] Descriptor sense data with sense descriptors (in hex): Dec 12 05:29:46 somepoorbox kernel: [30883839.026206]         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 Dec 12 05:29:46 somepoorbox kernel: [30883839.026215]         57 4f 86 7b Dec 12 05:29:46 somepoorbox kernel: [30883839.026219] sd 0:0:0:0: [sda]  Add. Sense: Unrecovered read error - auto reallocate failed Dec 12 05:29:46 somepoorbox kernel: [30883839.026225] sd 0:0:0:0: [sda] CDB: Read(10): 28 00 57 4f 8a 43 00 03 38 00 Dec 12 05:29:46 somepoorbox kernel: [30883839.026236] end_request: I/O error, dev sda, sector 1464830531 Dec 12 05:29:46 somepoorbox kernel: [30883839.026331] block drbd0: disk( UpToDate -> Failed ) Dec 12 05:29:46 somepoorbox kernel: [30883839.026345] block drbd0: Local IO failed in __req_mod. Detaching... Dec 12 05:29:46 somepoorbox kernel: [30883839.026365] block drbd0: helper command: /sbin/drbdadm pri-on-incon-degr minor-0 Dec 12 05:29:46 somepoorbox kernel: [30883839.026476] sd 0:0:0:0: [sda]  Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Dec 12 05:29:46 somepoorbox kernel: [30883839.026480] sd 0:0:0:0: [sda]  Sense Key : Medium Error [current] [descriptor] Dec 12 05:29:46 somepoorbox kernel: [30883839.026485] Descriptor sense data with sense descriptors (in hex): Dec 12 05:29:46 somepoorbox kernel: [30883839.026488]         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 Dec 12 05:29:46 somepoorbox kernel: [30883839.026497]         57 4f 86 7b Dec 12 05:29:46 somepoorbox kernel: [30883839.026501] sd 0:0:0:0: [sda]  Add. Sense: Unrecovered read error - auto reallocate failed Dec 12 05:29:46 somepoorbox kernel: [30883839.026506] sd 0:0:0:0: [sda] CDB: Read(10): 28 00 57 4f 86 7b 00 03 c8 00 Dec 12 05:29:46 somepoorbox kernel: [30883839.026514] end_request: I/O error, dev sda, sector 1464829563 Dec 12 05:29:46 somepoorbox kernel: [30883839.026632] block drbd0: IO ERROR: neither local nor remote disk Dec 12 05:29:46 somepoorbox kernel: [30883839.026636] ata1: EH complete Dec 12 05:29:46 somepoorbox kernel: [30883839.026728] block drbd0: IO ERROR: neither local nor remote disk Dec 12 05:29:46 somepoorbox kernel: [30883839.026811] block drbd0: IO ERROR: neither local nor remote disk Dec 12 05:29:46 somepoorbox kernel: [30883839.162977] Buffer I/O error on device drbd0, logical block 53203520 Dec 12 05:29:46 somepoorbox kernel: [30883839.163110] lost page write due to I/O error on drbd0 Dec 12 05:29:46 somepoorbox kernel: [30883839.163117] Buffer I/O error on device drbd0, logical block 59744311 Dec 12 05:29:46 somepoorbox kernel: [30883839.163200] lost page write due to I/O error on drbd0 Dec 12 05:29:46 somepoorbox kernel: [30883839.163208] Buffer I/O error on device drbd0, logical block 59744312 Dec 12 05:29:46 somepoorbox kernel: [30883839.163289] lost page write due to I/O error on drbd0 Dec 12 05:29:46 somepoorbox kernel: [30883839.163299] Buffer I/O error on device drbd0, logical block 59746338 Dec 12 05:29:46 somepoorbox kernel: [30883839.163316] Buffer I/O error on device drbd0, logical block 59744312 Dec 12 05:29:46 somepoorbox kernel: [30883839.163320] lost page write due to I/O error on drbd0 Dec 12 05:29:46 somepoorbox kernel: [30883839.163328] EXT3-fs: ext3_journal_dirty_data: aborting transaction: IO failure in ext3_journal_dirty_data Dec 12 05:29:46 somepoorbox kernel: [30883839.163336] EXT3-fs (drbd0): error in ext3_orphan_add: Readonly filesystem Dec 12 05:29:46 somepoorbox kernel: [30883839.165257]  [] ? warn_slowpath_common+0x91/0xe0 Dec 12 05:29:46 somepoorbox kernel: [30883839.165260] EXT3-fs (drbd0): I/O error while writing superblock Dec 12 05:29:46 somepoorbox kernel: [30883839.165280]  [] ? ext3_get_group_desc+0x51/0xa0 [ext3] Dec 12 05:29:46 somepoorbox kernel: [30883839.165285] JBD: Spotted dirty metadata buffer (dev = drbd0, blocknr = 0). There's a risk of filesystem corruption in case of system crash. Dec 12 05:29:46 somepoorbox kernel: [30883839.165292]  [] ? warn_slowpath_null+0x1a/0x20 Dec 12 05:29:46 somepoorbox kernel: [30883839.165297]  [] ? mark_buffer_dirty+0x82/0xa0 Dec 12 05:29:46 somepoorbox kernel: [30883839.165316]  [] ? ext3_commit_super.clone.0+0x69/0x100 [ext3] Dec 12 05:29:46 somepoorbox kernel: [30883839.165329]  [] ? ext3_handle_error+0x7f/0xe0 [ext3] Dec 12 05:29:46 somepoorbox kernel: [30883839.165343]  [] ? __ext3_std_error+0x5e/0xb0 [ext3] Dec 12 05:29:46 somepoorbox kernel: [30883839.165356]  [] ? ext3_orphan_add+0xbf/0x1a0 [ext3] Dec 12 05:29:46 somepoorbox kernel: [30883839.165360] EXT3-fs: ext3_journal_dirty_data: aborting transaction: IO failure in ext3_journal_dirty_data Dec 12 05:29:46 somepoorbox kernel: [30883839.165374]  [] ? journal_dirty_data_fn+0x0/0x30 [ext3] Dec 12 05:29:46 somepoorbox kernel: [30883839.165378] EXT3-fs (drbd0): error in ext3_orphan_add: Readonly filesystem [] ? ext3_ordered_write_end+0x158/0x1c0 [ext3] Dec 12 05:29:46 somepoorbox kernel: [30883839.165395] Dec 12 05:29:46 somepoorbox kernel: [30883839.165400]  [] ? generic_file_buffered_write_iter+0x184/0x2b0 Dec 12 05:29:46 somepoorbox kernel: [30883839.165407]  [] ? __generic_file_write_iter+0x225/0x420 Dec 12 05:29:46 somepoorbox kernel: [30883839.165412]  [] ? __generic_file_aio_write+0x85/0xa0 Dec 12 05:29:46 somepoorbox kernel: [30883839.165417]  [] ? generic_file_aio_write+0x88/0x100 Dec 12 05:29:46 somepoorbox kernel: [30883839.165423]  [] ? do_sync_write+0xf2/0x140 Dec 12 05:29:46 somepoorbox kernel: [30883839.165432]  [] ? sys_getpeername+0xd4/0xf0 Dec 12 05:29:46 somepoorbox kernel: [30883839.165436]  [] ? vfs_write+0xb8/0x1a0 Dec 12 05:29:46 somepoorbox kernel: [30883839.165441]  [] ? fget_light_pos+0x16/0x50 Dec 12 05:29:46 somepoorbox kernel: [30883839.165445]  [] ? sys_write+0x51/0xb0 Dec 12 05:29:46 somepoorbox kernel: [30883839.165450]  [] ? __audit_syscall_exit+0x25e/0x290 Dec 12 05:29:46 somepoorbox kernel: [30883839.165455]  [] ? system_call_fastpath+0x16/0x1b Dec 12 05:29:46 somepoorbox kernel: [30883839.165459] ---[ end trace 32aa3e2dc89d4c30 ]--- Dec 12 05:29:46 somepoorbox kernel: [30883839.165462] Tainting kernel with flag 0x9   

Tags:

doesn, predict, disk, errors, kernel, output, quot, pending, uncorrectable, sectors, reallocated, repeatedly, logs, assessment, current_pending_sector, attribute, unusable, eg, panics, raid, taints, smartctl, dev, sda, seagate, barracuda, es, ns, abcaeaaa, lu, wwn, firmware, bkh, user, capacity, bytes, gb, sector, database, ata, specification, draft, indicated, thu, dec, est, capability, enabled, overall, attribute_name, thresh, updated, when_failed, raw_value, raw_read_error_rate, spin_up_time, start_stop_count, old_age, reallocated_sector_ct, seek_error_rate, power_on_hours, spin_retry_count, power_cycle_count, reported_uncorrect, high_fly_writes, airflow_temperature_cel, min, temperature_celsius, hardware_ecc_recovered, offline_uncorrectable, offline, udma_crc_error_count, multi_zone_error_rate, data_address_mark_errs, var, somepoorbox, hostbyte, did_ok, driverbyte, driver_sense, medium, descriptor, descriptors, hex, unrecovered, auto, reallocate, cdb, end_request, drbd, uptodate, io, __req_mod, detaching, helper, sbin, drbdadm, pri, incon, degr, buffer, ext, fs, _journal_dirty_data, aborting, transaction, _orphan_add, readonly, filesystem, ffffffff, warn_slowpath_common, xe, superblock, ffffffffa, _get_group_desc, xa, jbd, metadata, blocknr, corruption, warn_slowpath_null, mark_buffer_dirty, ab, _commit_super, clone, ff, _handle_error, __ext, _std_error, xb, ebbf, xbf, dc, journal_dirty_data_fn, _ordered_write_end, generic_file_buffered_write_iter, __generic_file_write_iter, __generic_file_aio_write, generic_file_aio_write, do_sync_write, xf, sys_getpeername, xd, vfs_write, fget_light_pos, sys_write, eee, __audit_syscall_exit, system_call_fastpath, aa, tainting,

Latest Articles

  • Linux Ubuntu Cannot Print Large Images
  • Cannot Print PDF Solution and Howto Resize
  • Linux Console Login Screen TTY Change Message
  • Apache Cannot Start Listening Already on 0.0.0.0
  • MySQL Bash Query to pipe input directly without using heredoc trick
  • CentOS 6 and 7 / RHEL Persistent DHCP Solution
  • Debian Ubuntu Mint rc-local service startup error solution rc-local.service: Failed at step EXEC spawning /etc/rc.local: Exec format error
  • MySQL Cheatsheet Guide and Tutorial
  • bash script kill whois or other command that is running for too long
  • Linux tftp listens on all interfaces and IPs by DEFAULT Security Risk Hole Solution
  • python import docx error
  • Cisco Unified Communications Manager Express Cheatsheet CUCME CME
  • Linux Ubuntu Debian Missing privilege separation directory: /var/run/sshd
  • bash how to count the number of columns or words in a line
  • bash if statement how to test program output without assigning to variable
  • RTNETLINK answers: Network is unreachable
  • Centos 7 how to save iptables rules like Centos 6
  • nfs tuning maximum amount of connections
  • qemu-kvm error "Could not initialize SDL(No available video device) - exiting"
  • Centos 7 tftpd will not work with selinux enabled