Why SMART is not smart at all and doesn't properly predict disk errors that cause a kernel panic or crash

Before getting into the output here is my typical experience with SMART, there is what I call a "bad disk" with pending and uncorrectable sectors that cannot be reallocated.
It has caused a kernel panic and system crash repeatedly as we can see from the logs.
But SMART says it has "PASSED" its self assessment.  SMART is still useful to me but it is more about looking at Current_Pending_Sector.
Any time I have had anything but 0 for that attribute it means the disk is bad and is unusable (eg. will cause kernel panics).
In this case even RAID doesn't help when the bad disk taints the kernel.

First let's check this disk and see what SMART thinks

smartctl -a /dev/sda

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda ES
Device Model:     ST3750640NS
Serial Number:    ABCAEAAA
LU WWN Device Id: 5 000c50 0083422e5
Firmware Version: 3BKH
User Capacity:    750,156,374,016 bytes [750 GB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   7
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Thu Dec 13 12:43:37 2018 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED


ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   093   086   006    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0003   091   091   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       27
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   090   060   030    Pre-fail  Always       -       951683243
  9 Power_On_Hours          0x0032   052   052   000    Old_age   Always       -       42128
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       27
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   066   054   045    Old_age   Always       -       34 (Min/Max 28/36)
194 Temperature_Celsius     0x0022   034   046   000    Old_age   Always       -       34 (0 17 0 0 0)
195 Hardware_ECC_Recovered  0x001a   081   055   000    Old_age   Always       -       220199
197 Current_Pending_Sector  0x0012   096   096   000    Old_age   Always       -       93
198 Offline_Uncorrectable   0x0010   096   096   000    Old_age   Offline      -       93
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       971
200 Multi_Zone_Error_Rate   0x0000   100   253   000    Old_age   Offline      -       0
202 Data_Address_Mark_Errs  0x0032   100   253   000    Old_age   Always       -       0

Now let's see /var/log/messages

Dec 12 05:29:46 somepoorbox kernel: [30883839.026190] sd 0:0:0:0: [sda]  Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Dec 12 05:29:46 somepoorbox kernel: [30883839.026196] sd 0:0:0:0: [sda]  Sense Key : Medium Error [current] [descriptor] Dec 12 05:29:46 somepoorbox kernel: [30883839.026203] Descriptor sense data with sense descriptors (in hex): Dec 12 05:29:46 somepoorbox kernel: [30883839.026206]         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 Dec 12 05:29:46 somepoorbox kernel: [30883839.026215]         57 4f 86 7b Dec 12 05:29:46 somepoorbox kernel: [30883839.026219] sd 0:0:0:0: [sda]  Add. Sense: Unrecovered read error - auto reallocate failed Dec 12 05:29:46 somepoorbox kernel: [30883839.026225] sd 0:0:0:0: [sda] CDB: Read(10): 28 00 57 4f 8a 43 00 03 38 00 Dec 12 05:29:46 somepoorbox kernel: [30883839.026236] end_request: I/O error, dev sda, sector 1464830531 Dec 12 05:29:46 somepoorbox kernel: [30883839.026331] block drbd0: disk( UpToDate -> Failed ) Dec 12 05:29:46 somepoorbox kernel: [30883839.026345] block drbd0: Local IO failed in __req_mod. Detaching... Dec 12 05:29:46 somepoorbox kernel: [30883839.026365] block drbd0: helper command: /sbin/drbdadm pri-on-incon-degr minor-0 Dec 12 05:29:46 somepoorbox kernel: [30883839.026476] sd 0:0:0:0: [sda]  Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Dec 12 05:29:46 somepoorbox kernel: [30883839.026480] sd 0:0:0:0: [sda]  Sense Key : Medium Error [current] [descriptor] Dec 12 05:29:46 somepoorbox kernel: [30883839.026485] Descriptor sense data with sense descriptors (in hex): Dec 12 05:29:46 somepoorbox kernel: [30883839.026488]         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 Dec 12 05:29:46 somepoorbox kernel: [30883839.026497]         57 4f 86 7b Dec 12 05:29:46 somepoorbox kernel: [30883839.026501] sd 0:0:0:0: [sda]  Add. Sense: Unrecovered read error - auto reallocate failed Dec 12 05:29:46 somepoorbox kernel: [30883839.026506] sd 0:0:0:0: [sda] CDB: Read(10): 28 00 57 4f 86 7b 00 03 c8 00 Dec 12 05:29:46 somepoorbox kernel: [30883839.026514] end_request: I/O error, dev sda, sector 1464829563 Dec 12 05:29:46 somepoorbox kernel: [30883839.026632] block drbd0: IO ERROR: neither local nor remote disk Dec 12 05:29:46 somepoorbox kernel: [30883839.026636] ata1: EH complete Dec 12 05:29:46 somepoorbox kernel: [30883839.026728] block drbd0: IO ERROR: neither local nor remote disk Dec 12 05:29:46 somepoorbox kernel: [30883839.026811] block drbd0: IO ERROR: neither local nor remote disk Dec 12 05:29:46 somepoorbox kernel: [30883839.162977] Buffer I/O error on device drbd0, logical block 53203520 Dec 12 05:29:46 somepoorbox kernel: [30883839.163110] lost page write due to I/O error on drbd0 Dec 12 05:29:46 somepoorbox kernel: [30883839.163117] Buffer I/O error on device drbd0, logical block 59744311 Dec 12 05:29:46 somepoorbox kernel: [30883839.163200] lost page write due to I/O error on drbd0 Dec 12 05:29:46 somepoorbox kernel: [30883839.163208] Buffer I/O error on device drbd0, logical block 59744312 Dec 12 05:29:46 somepoorbox kernel: [30883839.163289] lost page write due to I/O error on drbd0 Dec 12 05:29:46 somepoorbox kernel: [30883839.163299] Buffer I/O error on device drbd0, logical block 59746338 Dec 12 05:29:46 somepoorbox kernel: [30883839.163316] Buffer I/O error on device drbd0, logical block 59744312 Dec 12 05:29:46 somepoorbox kernel: [30883839.163320] lost page write due to I/O error on drbd0 Dec 12 05:29:46 somepoorbox kernel: [30883839.163328] EXT3-fs: ext3_journal_dirty_data: aborting transaction: IO failure in ext3_journal_dirty_data Dec 12 05:29:46 somepoorbox kernel: [30883839.163336] EXT3-fs (drbd0): error in ext3_orphan_add: Readonly filesystem Dec 12 05:29:46 somepoorbox kernel: [30883839.165257]  [] ? warn_slowpath_common+0x91/0xe0 Dec 12 05:29:46 somepoorbox kernel: [30883839.165260] EXT3-fs (drbd0): I/O error while writing superblock Dec 12 05:29:46 somepoorbox kernel: [30883839.165280]  [] ? ext3_get_group_desc+0x51/0xa0 [ext3] Dec 12 05:29:46 somepoorbox kernel: [30883839.165285] JBD: Spotted dirty metadata buffer (dev = drbd0, blocknr = 0). There's a risk of filesystem corruption in case of system crash. Dec 12 05:29:46 somepoorbox kernel: [30883839.165292]  [] ? warn_slowpath_null+0x1a/0x20 Dec 12 05:29:46 somepoorbox kernel: [30883839.165297]  [] ? mark_buffer_dirty+0x82/0xa0 Dec 12 05:29:46 somepoorbox kernel: [30883839.165316]  [] ? ext3_commit_super.clone.0+0x69/0x100 [ext3] Dec 12 05:29:46 somepoorbox kernel: [30883839.165329]  [] ? ext3_handle_error+0x7f/0xe0 [ext3] Dec 12 05:29:46 somepoorbox kernel: [30883839.165343]  [] ? __ext3_std_error+0x5e/0xb0 [ext3] Dec 12 05:29:46 somepoorbox kernel: [30883839.165356]  [] ? ext3_orphan_add+0xbf/0x1a0 [ext3] Dec 12 05:29:46 somepoorbox kernel: [30883839.165360] EXT3-fs: ext3_journal_dirty_data: aborting transaction: IO failure in ext3_journal_dirty_data Dec 12 05:29:46 somepoorbox kernel: [30883839.165374]  [] ? journal_dirty_data_fn+0x0/0x30 [ext3] Dec 12 05:29:46 somepoorbox kernel: [30883839.165378] EXT3-fs (drbd0): error in ext3_orphan_add: Readonly filesystem [] ? ext3_ordered_write_end+0x158/0x1c0 [ext3] Dec 12 05:29:46 somepoorbox kernel: [30883839.165395] Dec 12 05:29:46 somepoorbox kernel: [30883839.165400]  [] ? generic_file_buffered_write_iter+0x184/0x2b0 Dec 12 05:29:46 somepoorbox kernel: [30883839.165407]  [] ? __generic_file_write_iter+0x225/0x420 Dec 12 05:29:46 somepoorbox kernel: [30883839.165412]  [] ? __generic_file_aio_write+0x85/0xa0 Dec 12 05:29:46 somepoorbox kernel: [30883839.165417]  [] ? generic_file_aio_write+0x88/0x100 Dec 12 05:29:46 somepoorbox kernel: [30883839.165423]  [] ? do_sync_write+0xf2/0x140 Dec 12 05:29:46 somepoorbox kernel: [30883839.165432]  [] ? sys_getpeername+0xd4/0xf0 Dec 12 05:29:46 somepoorbox kernel: [30883839.165436]  [] ? vfs_write+0xb8/0x1a0 Dec 12 05:29:46 somepoorbox kernel: [30883839.165441]  [] ? fget_light_pos+0x16/0x50 Dec 12 05:29:46 somepoorbox kernel: [30883839.165445]  [] ? sys_write+0x51/0xb0 Dec 12 05:29:46 somepoorbox kernel: [30883839.165450]  [] ? __audit_syscall_exit+0x25e/0x290 Dec 12 05:29:46 somepoorbox kernel: [30883839.165455]  [] ? system_call_fastpath+0x16/0x1b Dec 12 05:29:46 somepoorbox kernel: [30883839.165459] ---[ end trace 32aa3e2dc89d4c30 ]--- Dec 12 05:29:46 somepoorbox kernel: [30883839.165462] Tainting kernel with flag 0x9   

Tags:

doesn, predict, disk, errors, kernel, output, quot, pending, uncorrectable, sectors, reallocated, repeatedly, logs, assessment, current_pending_sector, attribute, unusable, eg, panics, raid, taints, smartctl, dev, sda, seagate, barracuda, es, ns, abcaeaaa, lu, wwn, firmware, bkh, user, capacity, bytes, gb, sector, database, ata, specification, draft, indicated, thu, dec, est, capability, enabled, overall, attribute_name, thresh, updated, when_failed, raw_value, raw_read_error_rate, spin_up_time, start_stop_count, old_age, reallocated_sector_ct, seek_error_rate, power_on_hours, spin_retry_count, power_cycle_count, reported_uncorrect, high_fly_writes, airflow_temperature_cel, min, temperature_celsius, hardware_ecc_recovered, offline_uncorrectable, offline, udma_crc_error_count, multi_zone_error_rate, data_address_mark_errs, var, somepoorbox, hostbyte, did_ok, driverbyte, driver_sense, medium, descriptor, descriptors, hex, unrecovered, auto, reallocate, cdb, end_request, drbd, uptodate, io, __req_mod, detaching, helper, sbin, drbdadm, pri, incon, degr, buffer, ext, fs, _journal_dirty_data, aborting, transaction, _orphan_add, readonly, filesystem, ffffffff, warn_slowpath_common, xe, superblock, ffffffffa, _get_group_desc, xa, jbd, metadata, blocknr, corruption, warn_slowpath_null, mark_buffer_dirty, ab, _commit_super, clone, ff, _handle_error, __ext, _std_error, xb, ebbf, xbf, dc, journal_dirty_data_fn, _ordered_write_end, generic_file_buffered_write_iter, __generic_file_write_iter, __generic_file_aio_write, generic_file_aio_write, do_sync_write, xf, sys_getpeername, xd, vfs_write, fget_light_pos, sys_write, eee, __audit_syscall_exit, system_call_fastpath, aa, tainting,

Latest Articles

  • Cisco Switches How To Get Of Port Line Status Console Messages
  • Cisco DHCP Snooping Relay Setup Information
  • Cisco Switch Setup Guide Command List
  • Cisco 2960 Switch Reset To Factory Defaults
  • How To Boot Cisco CUCM UCSInstall 8.6, 10, 11 and 12 on KVM/Proxmox
  • VBOX VirtualBox How To Import Raw .img Disk File
  • Windows Server 2012, 2016, 2019 How To Install and Missing Disabled Telnet Client
  • proxmox vm networking breaks when you restart your network on the hostnode
  • Linux ln symlink how to update existing symbolic link
  • Ubuntu 18.04 / Linux Mint 19.1 Cannot Type or Login - solution
  • LUKS Hard Drive Encryption on Linux Mint Ubuntu Debian etc how to mount encrypted hard drive
  • How to use nmap locate other machines/computers/servers on your network using nmap
  • Linux Mint 18.2 Create Config File To Start Application Upon Login
  • Dell Wyse Thin Client BIOS Access Key
  • sudoers file in /etc warning about comments/includes!
  • Centos 7 Reallocate logical volume space to another
  • lvm how to reduce volume size
  • letsencrypt certbot error "Unable to find a virtual host listening on port 80 which is currently needed for Certbot to prove to the CA that you control your domain. Please add a virtual host for port 80."
  • SSH error cannot Forward or Listen "bind: Cannot assign requested address"
  • X11 SSH Linux Forwarding Error