Why SMART is not smart at all and doesn't properly predict disk errors that cause a kernel panic or crash

Before getting into the output here is my typical experience with SMART, there is what I call a "bad disk" with pending and uncorrectable sectors that cannot be reallocated.
It has caused a kernel panic and system crash repeatedly as we can see from the logs.
But SMART says it has "PASSED" its self assessment.  SMART is still useful to me but it is more about looking at Current_Pending_Sector.
Any time I have had anything but 0 for that attribute it means the disk is bad and is unusable (eg. will cause kernel panics).
In this case even RAID doesn't help when the bad disk taints the kernel.

First let's check this disk and see what SMART thinks

smartctl -a /dev/sda

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda ES
Device Model:     ST3750640NS
Serial Number:    ABCAEAAA
LU WWN Device Id: 5 000c50 0083422e5
Firmware Version: 3BKH
User Capacity:    750,156,374,016 bytes [750 GB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   7
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Thu Dec 13 12:43:37 2018 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED


ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   093   086   006    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0003   091   091   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       27
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   090   060   030    Pre-fail  Always       -       951683243
  9 Power_On_Hours          0x0032   052   052   000    Old_age   Always       -       42128
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       27
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   066   054   045    Old_age   Always       -       34 (Min/Max 28/36)
194 Temperature_Celsius     0x0022   034   046   000    Old_age   Always       -       34 (0 17 0 0 0)
195 Hardware_ECC_Recovered  0x001a   081   055   000    Old_age   Always       -       220199
197 Current_Pending_Sector  0x0012   096   096   000    Old_age   Always       -       93
198 Offline_Uncorrectable   0x0010   096   096   000    Old_age   Offline      -       93
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       971
200 Multi_Zone_Error_Rate   0x0000   100   253   000    Old_age   Offline      -       0
202 Data_Address_Mark_Errs  0x0032   100   253   000    Old_age   Always       -       0

Now let's see /var/log/messages

Dec 12 05:29:46 somepoorbox kernel: [30883839.026190] sd 0:0:0:0: [sda]  Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Dec 12 05:29:46 somepoorbox kernel: [30883839.026196] sd 0:0:0:0: [sda]  Sense Key : Medium Error [current] [descriptor] Dec 12 05:29:46 somepoorbox kernel: [30883839.026203] Descriptor sense data with sense descriptors (in hex): Dec 12 05:29:46 somepoorbox kernel: [30883839.026206]         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 Dec 12 05:29:46 somepoorbox kernel: [30883839.026215]         57 4f 86 7b Dec 12 05:29:46 somepoorbox kernel: [30883839.026219] sd 0:0:0:0: [sda]  Add. Sense: Unrecovered read error - auto reallocate failed Dec 12 05:29:46 somepoorbox kernel: [30883839.026225] sd 0:0:0:0: [sda] CDB: Read(10): 28 00 57 4f 8a 43 00 03 38 00 Dec 12 05:29:46 somepoorbox kernel: [30883839.026236] end_request: I/O error, dev sda, sector 1464830531 Dec 12 05:29:46 somepoorbox kernel: [30883839.026331] block drbd0: disk( UpToDate -> Failed ) Dec 12 05:29:46 somepoorbox kernel: [30883839.026345] block drbd0: Local IO failed in __req_mod. Detaching... Dec 12 05:29:46 somepoorbox kernel: [30883839.026365] block drbd0: helper command: /sbin/drbdadm pri-on-incon-degr minor-0 Dec 12 05:29:46 somepoorbox kernel: [30883839.026476] sd 0:0:0:0: [sda]  Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Dec 12 05:29:46 somepoorbox kernel: [30883839.026480] sd 0:0:0:0: [sda]  Sense Key : Medium Error [current] [descriptor] Dec 12 05:29:46 somepoorbox kernel: [30883839.026485] Descriptor sense data with sense descriptors (in hex): Dec 12 05:29:46 somepoorbox kernel: [30883839.026488]         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 Dec 12 05:29:46 somepoorbox kernel: [30883839.026497]         57 4f 86 7b Dec 12 05:29:46 somepoorbox kernel: [30883839.026501] sd 0:0:0:0: [sda]  Add. Sense: Unrecovered read error - auto reallocate failed Dec 12 05:29:46 somepoorbox kernel: [30883839.026506] sd 0:0:0:0: [sda] CDB: Read(10): 28 00 57 4f 86 7b 00 03 c8 00 Dec 12 05:29:46 somepoorbox kernel: [30883839.026514] end_request: I/O error, dev sda, sector 1464829563 Dec 12 05:29:46 somepoorbox kernel: [30883839.026632] block drbd0: IO ERROR: neither local nor remote disk Dec 12 05:29:46 somepoorbox kernel: [30883839.026636] ata1: EH complete Dec 12 05:29:46 somepoorbox kernel: [30883839.026728] block drbd0: IO ERROR: neither local nor remote disk Dec 12 05:29:46 somepoorbox kernel: [30883839.026811] block drbd0: IO ERROR: neither local nor remote disk Dec 12 05:29:46 somepoorbox kernel: [30883839.162977] Buffer I/O error on device drbd0, logical block 53203520 Dec 12 05:29:46 somepoorbox kernel: [30883839.163110] lost page write due to I/O error on drbd0 Dec 12 05:29:46 somepoorbox kernel: [30883839.163117] Buffer I/O error on device drbd0, logical block 59744311 Dec 12 05:29:46 somepoorbox kernel: [30883839.163200] lost page write due to I/O error on drbd0 Dec 12 05:29:46 somepoorbox kernel: [30883839.163208] Buffer I/O error on device drbd0, logical block 59744312 Dec 12 05:29:46 somepoorbox kernel: [30883839.163289] lost page write due to I/O error on drbd0 Dec 12 05:29:46 somepoorbox kernel: [30883839.163299] Buffer I/O error on device drbd0, logical block 59746338 Dec 12 05:29:46 somepoorbox kernel: [30883839.163316] Buffer I/O error on device drbd0, logical block 59744312 Dec 12 05:29:46 somepoorbox kernel: [30883839.163320] lost page write due to I/O error on drbd0 Dec 12 05:29:46 somepoorbox kernel: [30883839.163328] EXT3-fs: ext3_journal_dirty_data: aborting transaction: IO failure in ext3_journal_dirty_data Dec 12 05:29:46 somepoorbox kernel: [30883839.163336] EXT3-fs (drbd0): error in ext3_orphan_add: Readonly filesystem Dec 12 05:29:46 somepoorbox kernel: [30883839.165257]  [] ? warn_slowpath_common+0x91/0xe0 Dec 12 05:29:46 somepoorbox kernel: [30883839.165260] EXT3-fs (drbd0): I/O error while writing superblock Dec 12 05:29:46 somepoorbox kernel: [30883839.165280]  [] ? ext3_get_group_desc+0x51/0xa0 [ext3] Dec 12 05:29:46 somepoorbox kernel: [30883839.165285] JBD: Spotted dirty metadata buffer (dev = drbd0, blocknr = 0). There's a risk of filesystem corruption in case of system crash. Dec 12 05:29:46 somepoorbox kernel: [30883839.165292]  [] ? warn_slowpath_null+0x1a/0x20 Dec 12 05:29:46 somepoorbox kernel: [30883839.165297]  [] ? mark_buffer_dirty+0x82/0xa0 Dec 12 05:29:46 somepoorbox kernel: [30883839.165316]  [] ? ext3_commit_super.clone.0+0x69/0x100 [ext3] Dec 12 05:29:46 somepoorbox kernel: [30883839.165329]  [] ? ext3_handle_error+0x7f/0xe0 [ext3] Dec 12 05:29:46 somepoorbox kernel: [30883839.165343]  [] ? __ext3_std_error+0x5e/0xb0 [ext3] Dec 12 05:29:46 somepoorbox kernel: [30883839.165356]  [] ? ext3_orphan_add+0xbf/0x1a0 [ext3] Dec 12 05:29:46 somepoorbox kernel: [30883839.165360] EXT3-fs: ext3_journal_dirty_data: aborting transaction: IO failure in ext3_journal_dirty_data Dec 12 05:29:46 somepoorbox kernel: [30883839.165374]  [] ? journal_dirty_data_fn+0x0/0x30 [ext3] Dec 12 05:29:46 somepoorbox kernel: [30883839.165378] EXT3-fs (drbd0): error in ext3_orphan_add: Readonly filesystem [] ? ext3_ordered_write_end+0x158/0x1c0 [ext3] Dec 12 05:29:46 somepoorbox kernel: [30883839.165395] Dec 12 05:29:46 somepoorbox kernel: [30883839.165400]  [] ? generic_file_buffered_write_iter+0x184/0x2b0 Dec 12 05:29:46 somepoorbox kernel: [30883839.165407]  [] ? __generic_file_write_iter+0x225/0x420 Dec 12 05:29:46 somepoorbox kernel: [30883839.165412]  [] ? __generic_file_aio_write+0x85/0xa0 Dec 12 05:29:46 somepoorbox kernel: [30883839.165417]  [] ? generic_file_aio_write+0x88/0x100 Dec 12 05:29:46 somepoorbox kernel: [30883839.165423]  [] ? do_sync_write+0xf2/0x140 Dec 12 05:29:46 somepoorbox kernel: [30883839.165432]  [] ? sys_getpeername+0xd4/0xf0 Dec 12 05:29:46 somepoorbox kernel: [30883839.165436]  [] ? vfs_write+0xb8/0x1a0 Dec 12 05:29:46 somepoorbox kernel: [30883839.165441]  [] ? fget_light_pos+0x16/0x50 Dec 12 05:29:46 somepoorbox kernel: [30883839.165445]  [] ? sys_write+0x51/0xb0 Dec 12 05:29:46 somepoorbox kernel: [30883839.165450]  [] ? __audit_syscall_exit+0x25e/0x290 Dec 12 05:29:46 somepoorbox kernel: [30883839.165455]  [] ? system_call_fastpath+0x16/0x1b Dec 12 05:29:46 somepoorbox kernel: [30883839.165459] ---[ end trace 32aa3e2dc89d4c30 ]--- Dec 12 05:29:46 somepoorbox kernel: [30883839.165462] Tainting kernel with flag 0x9   

Tags:

doesn, predict, disk, errors, kernel, output, quot, pending, uncorrectable, sectors, reallocated, repeatedly, logs, assessment, current_pending_sector, attribute, unusable, eg, panics, raid, taints, smartctl, dev, sda, seagate, barracuda, es, ns, abcaeaaa, lu, wwn, firmware, bkh, user, capacity, bytes, gb, sector, database, ata, specification, draft, indicated, thu, dec, est, capability, enabled, overall, attribute_name, thresh, updated, when_failed, raw_value, raw_read_error_rate, spin_up_time, start_stop_count, old_age, reallocated_sector_ct, seek_error_rate, power_on_hours, spin_retry_count, power_cycle_count, reported_uncorrect, high_fly_writes, airflow_temperature_cel, min, temperature_celsius, hardware_ecc_recovered, offline_uncorrectable, offline, udma_crc_error_count, multi_zone_error_rate, data_address_mark_errs, var, somepoorbox, hostbyte, did_ok, driverbyte, driver_sense, medium, descriptor, descriptors, hex, unrecovered, auto, reallocate, cdb, end_request, drbd, uptodate, io, __req_mod, detaching, helper, sbin, drbdadm, pri, incon, degr, buffer, ext, fs, _journal_dirty_data, aborting, transaction, _orphan_add, readonly, filesystem, ffffffff, warn_slowpath_common, xe, superblock, ffffffffa, _get_group_desc, xa, jbd, metadata, blocknr, corruption, warn_slowpath_null, mark_buffer_dirty, ab, _commit_super, clone, ff, _handle_error, __ext, _std_error, xb, ebbf, xbf, dc, journal_dirty_data_fn, _ordered_write_end, generic_file_buffered_write_iter, __generic_file_write_iter, __generic_file_aio_write, generic_file_aio_write, do_sync_write, xf, sys_getpeername, xd, vfs_write, fget_light_pos, sys_write, eee, __audit_syscall_exit, system_call_fastpath, aa, tainting,

Latest Articles

  • How high can a Xeon CPU get?
  • bash fix PATH environment variable "command not found" solution
  • Ubuntu Linux Mint Debian Redhat Youtube Cannot Play HD or 4K videos, dropped frames or high CPU usage with Nvidia or AMD Driver
  • hostapd example configuration for high speed AC on 5GHz using WPA2
  • hostapd how to enable and use WPS to connect wireless devices like printers
  • Dell Server Workstation iDRAC Dead after Firmware Update Solution R720, R320, R730
  • Cloned VM/Server/Computer in Linux won't boot and goes to initramfs busybox Solution
  • How To Add Windows 7 8 10 11 to GRUB Boot List Dual Booting
  • How to configure OpenDKIM on Linux with Postfix and setup bind zonefile
  • Debian Ubuntu 10/11/12 Linux how to get tftpd-hpa server setup tutorial
  • efibootmgr: option requires an argument -- 'd' efibootmgr version 15 grub-install.real: error: efibootmgr failed to register the boot entry: Operation not permitted.
  • Apache Error Won't start SSL Cert Issue Solution Unable to configure verify locations for client authentication SSL Library Error: 151441510 error:0906D066:PEM routines:PEM_read_bio:bad end line SSL Library Error: 185090057 error:0B084009:x509 certif
  • Linux Debian Mint Ubuntu Bridge br0 gets random IP
  • redis requirements
  • How to kill a docker swarm
  • docker swarm silly issues
  • isc-dhcp-server dhcpd how to get longer lease
  • nvidia cannot resume from sleep Comm: nvidia-sleep.sh Tainted: Linux Ubuntu Mint Debian
  • zfs and LUKS how to recover in Linux
  • [error] (28)No space left on device: Cannot create SSLMutex Apache Solution Linux CentOS Ubuntu Debian Mint