Sometimes users take their removal drives and unplug and replug them to test what happens during the failure of a disk. However, this breaks things quite badly due to the /dev/mapper in LUKS not coming back online due to it not being closed.
In other words, generally with non-encrypted drives the process is smooth but when encrypted you may want to follow a strategy like this:
We can see below that both disks are unavailable as they were physically removed from the server.
zpool status
pool: rttpool
state: UNAVAIL
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run 'zpool clear'.
see: http://zfsonlinux.org/msg/ZFS-8000-HC
scan: none requested
config:
NAME STATE READ WRITE CKSUM
rttpool UNAVAIL 0 0 0 insufficient replicas
mirror-0 UNAVAIL 0 0 0 insufficient replicas
zpool-sdj1 FAULTED 0 0 0 corrupted data
zpool-sdk1 FAULTED 0 0 0 corrupted data
errors: List of errors unavailable: pool I/O is currently suspended
Conventional wisdom says to clear the error after replugging the disks but does this work with LUKS?
root@rttbox:/home/rtt# zpool clear rttpool zpool-sdj1
cannot clear errors for zpool-sdj1: I/O error
root@rttbox:/home/rtt# zpool clear rttpool zpool-sdj
cannot clear errors for zpool-sdj: no such device in pool
root@rttbox:/home/rtt# zpool clear rttpool zpool-sdj1
cannot clear errors for zpool-sdj1: I/O error
root@rttbox:/home/rtt# zpool clear rttpool zpool-sdk1
cannot clear errors for zpool-sdk1: I/O error
As we can see, no it doesn't work.
Sometimes we may need to remove zpool.cache
#at your own risk do not try in production or not as a first resort just in case
rm /etc/zfs/zpool.cache
Now let's force clear the pool
zpool clear -nF rttpool
root@rttbox:/home/rtt# zpool status
pool: rttpool
state: UNAVAIL
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run 'zpool clear'.
see: http://zfsonlinux.org/msg/ZFS-8000-HC
scan: none requested
config:
NAME STATE READ WRITE CKSUM
rttpool UNAVAIL 0 0 0 insufficient replicas
mirror-0 UNAVAIL 0 0 0 insufficient replicas
zpool-sdj1 FAULTED 0 0 0 too many errors
zpool-sdk1 FAULTED 0 0 0 too many errors
errors: List of errors unavailable: pool I/O is currently suspended
It still doesn't work as we can see.
How about clearing the device in the pool itself?
root@rttbox:/home/rtt# zpool clear -nF rttpool zpool-sdj1
root@rttbox:/home/rtt# zpool clear -nF rttpool zpool-sdk1
root@rttbox:/home/rtt# zpool online rttpool zpool-sdk1
cannot online zpool-sdk1: pool I/O is currently suspended
root@rttbox:/home/rtt# zpool online rttpool zpool-sdj1
cannot online zpool-sdj1: pool I/O is currently suspended
We can see that it still doesn't fix it.
Properly use cryptsetup to close and remove the zpool devices.
cryptsetup close zpool-sdj1
cryptsetup close zpool-sdk1
Now reopen the devices:
cryptsetup open /dev/sdj1 zpool-sdj1
cryptsetup open /dev/sdk1 zpool-sdk1
#then do cryptsetup open
zpool clear -nFX rttpool
now it works!
zpool status
pool: rttpool
state: ONLINE
scan: resilvered 160K in 0h0m with 0 errors on Thu Feb 1 23:01:59 2024
config:
NAME STATE READ WRITE CKSUM
rttpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
zpool-sdj1 ONLINE 0 0 0
zpool-sdk1 ONLINE 0 0 0
errors: No known data errors
If your disks are ready to go, zpool import will scan all disks looking for ZFS.
zpool import
pool: rttpool
id: 125324434212034535323
state: ONLINE
action: The pool can be imported using its name or numeric identifier.
config:
rttpool ONLINE
mirror-0 ONLINE
zpool-sdc1 ONLINE
zpool-sdd1 ONLINE
Now just import it by the numeric ID 125324434212034535323 or the pool name rttpool
zpool import 125324434212034535323
zfs, luks, linuxsometimes, users, removal, unplug, replug, disk, dev, mapper, online, generally, encrypted, disks, unavailable, server, zpool, rttpool, unavail, devices, faulted, io, failures, http, zfsonlinux, org, msg, hc, scan, requested, config, cksum, insufficient, replicas, sdj, corrupted, sdk, errors, currently, suspended, conventional, replugging, rttbox, rtt, doesn, cache, resort, rm, etc, nf, clearing, cryptsetup, reopen, nfx, resilvered, thu, feb,