Terminate hung scrub and/or export without rebooting?

FrankWard · Feb 26, 2024

I am looking for any and all solutions that can TERMINATE an existing scrub and/or export process in TrueNAS Scale WITHOUT rebooting? The TrueNAS UI is hanging on two tasks for one disk/pool. If I try to initiate an export for another pool it just hangs there and does nothing. Clearly something is hung in the OS, but TN has no clue how to fix it.

I've tried the following, but the tasks are still visible.

- zpool clear POOL (hangs the shell)
- zpool export -f POOL (hangs the shell)
- zpool scrub -s POOL (hangs the shell)
- remove the drive via UI (hangs at 20%, TN still thinks the tasks are running)

Surely, with the almighty powerful Linux, there's a way to terminate whatever processes are hanging TN.

Davvo · Feb 26, 2024

zpool scrub -s poolname stops the scrub.

zpool-scrub.8 — OpenZFS documentation

openzfs.github.io

FrankWard · Feb 26, 2024

Davvo said:
zpool scrub -s poolname stops the scrub.

zpool-scrub.8 — OpenZFS documentation

openzfs.github.io

I tried this also, but forgot to list it. Updated.

- zpool scrub -s POOL (hangs the shell)

joeschmuck · Feb 26, 2024

I guess I'm a bit confused as to why you have so many operations going on at once, it's amazing something is happening at all.

I'm not sure why you do not want to reboot, maybe you meant a hard reboot. At the CLI reboot and it will perform a soft reboot (maybe in your case). If the system hangs or will not reboot, then you may be forced to perform a hard reset.

What version of TrueNAS are you running, saying SCALE does not help much since there are several versions of SCALE.

Look into 'htop' and 'top' commands, you can manually terminate running code, but do not expect your system to be working fine, you may stop a piece of important code.

FrankWard · Feb 26, 2024

joeschmuck said:
I guess I'm a bit confused as to why you have so many operations going on at once, it's amazing something is happening at all.

I'm not sure why you do not want to reboot, maybe you meant a hard reboot. At the CLI reboot and it will perform a soft reboot (maybe in your case). If the system hangs or will not reboot, then you may be forced to perform a hard reset.

What version of TrueNAS are you running, saying SCALE does not help much since there are several versions of SCALE.

Look into 'htop' and 'top' commands, you can manually terminate running code, but do not expect your system to be working fine, you may stop a piece of important code.

Hi Joe. All of the systems I am running are listed in detail in my signature. Does the sig not show up on your device?

I had a disk that produced ZFS errors. I possibly did things in the wrong order with this drive. The first thing I tried was scrub, but that sat there at 0%. After about 10 minutes, I tried to clear any issues using zpool clear which hung in the shell, but I refreshed the UI and I guess it cleared because the drive was not suspended anymore. I then tried to export the drive which also sat there useless in the UI. Since that didn't work, I tried to export a different pool and that also just sat there with the export dialog useless in the UI. This is how I arrived here.

TrueNAS has issues either terminating processes that hang or properly reporting it in the UI, which is why there are so many hung processes. Either that, or TrueNAS doesn't error out properly when it encounters an issue and leaves the UI in limbo. I'm guessing this is because it hasn't matured fully, so let's hope the UI is more friendly in regards to ZFS and other errors (such as not being able to edit the VM properties which was recently fixed) in future builds.

I have yet to find a way to identify the exact processes to clear up the UI not properly reporting progress without a reboot, and I do not want to hose the system by terminating random processes via top, which is why I am asking to see if anyone else has a better method of recovering TrueNAS UI once it hoses itself.

I have since removed the drive, rebooted, then removed the pool successfully, but the question still remains. If an export/scrub process hangs like this, are there specific actions that can fix it aside from a reboot?

HoneyBadger · Feb 26, 2024

Hey @FrankWard

Thanks for the detail. It's likely that the ZFS processes were the ones hanging up because there was a drive that in a "Schrodinger's Failure" state - alive enough to be present on the device bus, but absent enough to hang up when issued ZFS commands. Some upstream improvements have been made that should land in OpenZFS 2.3 that should help with this:

Add slow disk diagnosis to ZED by don-brady · Pull Request #15469 · openzfs/zfs

Motivation and Context Slow disk response times can be indicative of a failing drive. ZFS currently tracks slow I/Os (slower than zio_slow_io_ms) and generates events (ereport.fs.zfs.delay). But ...

github.com

Assuming there's enough redundancy, pulling (or possibly setting offline in ZFS?) the offending drive should cause the tasks to complete (or abort) successfully, but there's always the challenge of how to know you're pulling the right drive.

joeschmuck · Feb 26, 2024

FrankWard said:
I am running are listed in detail in my signature. Does the sig not show up on your device?

It shows up, but System 1 or System 2?

FrankWard said:
I possibly did things in the wrong order with this drive.

You are not alone. Been there myself.

FrankWard said:
The first thing I tried was scrub, but that sat there at 0%. After about 10 minutes

Scrubs take a long time if you have a lot of data. Some scrubs take almost a week. And you have some high capacity drives. How long does a SMART Long test take? Now almost double it. Okay, that is not really true but the drive is working for the computer vice just an internal test and it takes much longer than a Long test.

FrankWard said:
I have since removed the drive, rebooted, then removed the pool successfully, but the question still remains. If an export/scrub process hangs like this, are there specific actions that can fix it aside from a reboot?

Not that I'm aware of. I suspect one of the iXsystem developers might be able to help but they rarely are on the forums.

ZFS is robust, a reboot should not cause any harm that couldn't be recovered from, hence redundancy.

Now you have @HoneyBadger so you are in good hands.

FrankWard · Feb 26, 2024

HoneyBadger said:
Hey @FrankWard

Thanks for the detail. It's likely that the ZFS processes were the ones hanging up because there was a drive that in a "Schrodinger's Failure" state - alive enough to be present on the device bus, but absent enough to hang up when issued ZFS commands. Some upstream improvements have been made that should land in OpenZFS 2.3 that should help with this:

Add slow disk diagnosis to ZED by don-brady · Pull Request #15469 · openzfs/zfs

Motivation and Context Slow disk response times can be indicative of a failing drive. ZFS currently tracks slow I/Os (slower than zio_slow_io_ms) and generates events (ereport.fs.zfs.delay). But ...

github.com

Assuming there's enough redundancy, pulling (or possibly setting offline in ZFS?) the offending drive should cause the tasks to complete (or abort) successfully, but there's always the challenge of how to know you're pulling the right drive.

Thanks for the info. I've run into this before rarely, so I'll be better equipped for the next battle.

FrankWard · Feb 26, 2024

joeschmuck said:
It shows up, but System 1 or System 2?

Thanks. Scale 23.10.1. And yes, @HoneyBadger is a pillar of the community.

joeschmuck · Feb 26, 2024

FrankWard said:
And yes, @HoneyBadger is a pillar of the community.

Hey, I didn't say that, now his head will swell.

Important Announcement for the TrueNAS Community.

Terminate hung scrub and/or export without rebooting?

FrankWard

Explorer

Davvo

MVP

zpool-scrub.8 — OpenZFS documentation

FrankWard

Explorer

zpool-scrub.8 — OpenZFS documentation

joeschmuck

Old Man

FrankWard

Explorer

HoneyBadger

actually does care

Add slow disk diagnosis to ZED by don-brady · Pull Request #15469 · openzfs/zfs

joeschmuck

Old Man

FrankWard

Explorer

Add slow disk diagnosis to ZED by don-brady · Pull Request #15469 · openzfs/zfs

FrankWard

Explorer

joeschmuck

Old Man

Similar threads

Important Announcement for the TrueNAS Community.

Terminate hung scrub and/or export without rebooting?

Explorer

MVP

Explorer

Old Man

Explorer

actually does care

Old Man

Explorer

Explorer

Old Man

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Terminate hung scrub and/or export without rebooting?"

Similar threads