Very Slow directory / file deletions from shell

cmorgan

Cadet
Joined
Oct 29, 2019
Messages
9
FreeNAS 11.3-U4.1
8TB pool (2x8TB hdd)
8GB ram
zfs dedupe disabled
lz4 enabled

I've had a stuck timemachine and decided to wipe out the sparse bundle. Tried from the Mac and it hung. Aborted and now I'm using the web interface with the shell and its very very slow to delete 1.1TB of data in only 390k files.

zpool scrub didn't report anything

Is it normal to take tens of minutes to remote 1.1TB of data? I figured zfs would be able to delete almost any amount of data in almost no time at all...

What could be going on?
 

cmorgan

Cadet
Joined
Oct 29, 2019
Messages
9
Also, it does look like the zpool scrub is in progress now but it wasn't the first time I tried 'rm -rf ./directory'

It's about 4.8% complete, 4.5 hours to go.
 

cmorgan

Cadet
Joined
Oct 29, 2019
Messages
9
Alright, so the scrub was certainly slowing it down, paused that via 'zpool scrub -p main_pool' but its still terribly slow to delete:

root@freenas[/mnt/main_pool/cmorgan]# du -sch timemachine
1.0T timemachine
1.0T total
root@freenas[/mnt/main_pool/cmorgan]# du -sch timemachine
996G timemachine
996G total
root@freenas[/mnt/main_pool/cmorgan]# du -sch timemachine
995G timemachine
995G total
root@freenas[/mnt/main_pool/cmorgan]#
root@freenas[/mnt/main_pool/cmorgan]# du -sch timemachine
985G timemachine
985G total
root@freenas[/mnt/main_pool/cmorgan]# du -sch timemachine
985G timemachine
985G total
root@freenas[/mnt/main_pool/cmorgan]# du

Each one of the is maybe a few minutes apart.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
I've had some slow deletions before and it sucks when it happens. So a few things... Reboot FreeNAS (power off the system). Try to delete the files again. Do you have any VM's/Jauls/Docker running? Stop them. You have 8GB RAM, not much and I'm thinking you are possibly running a SMART Log Test on your drives or running VMs and running out of RAM so there is a log of caching going on. When you type top can you see any other items using the hard drives? Are you using laptop hard drives which are known to be slow. Maybe a better description of your system is also in order. But your hardware may be substandard and you might just need to wait for the files to delete.

Regardless of the situation, I hope you figure out the cause but the main goal is to get timemachine back running normally.
 

ornias

Wizard
Joined
Mar 6, 2020
Messages
1,458
It basically a known ZFS issue by now.
It does the actuall deletion in the background, although a reboot almost always fixes the issue...
 

cmorgan

Cadet
Joined
Oct 29, 2019
Messages
9
I've had some slow deletions before and it sucks when it happens. So a few things... Reboot FreeNAS (power off the system). Try to delete the files again. Do you have any VM's/Jauls/Docker running? Stop them. You have 8GB RAM, not much and I'm thinking you are possibly running a SMART Log Test on your drives or running VMs and running out of RAM so there is a log of caching going on. When you type top can you see any other items using the hard drives? Are you using laptop hard drives which are known to be slow. Maybe a better description of your system is also in order. But your hardware may be substandard and you might just need to wait for the files to delete.

Regardless of the situation, I hope you figure out the cause but the main goal is to get timemachine back running normally.


Hi Joe.

No VMs or jails running. Upgraded to TrueNAS 12.0 and its deleting slightly quicker but still took a couple of hours to delete the 1TB of data (just finished). Top wasn't showing any high cpu processes. They are WD nas drives, 5200 rpm or something.
 

ornias

Wizard
Joined
Mar 6, 2020
Messages
1,458
Hi Joe.

No VMs or jails running. Upgraded to TrueNAS 12.0 and its deleting slightly quicker but still took a couple of hours to delete the 1TB of data (just finished). Top wasn't showing any high cpu processes. They are WD nas drives, 5200 rpm or something.
How full was your pool at that time?
 

ornias

Wizard
Joined
Mar 6, 2020
Messages
1,458
To be clear:
rm -Rf takes a long time, or the space takes a long time to free?

I the last case, is there any issues this is causing you? (considering you need to have 10-20% freespace for ZFS anyhow)
Not saying your issue is minor, it isn't and it's freaking weird at times... Just interesting if there is a really issue being caused by this.
 

cmorgan

Cadet
Joined
Oct 29, 2019
Messages
9
To be clear:
rm -Rf takes a long time, or the space takes a long time to free?

I the last case, is there any issues this is causing you? (considering you need to have 10-20% freespace for ZFS anyhow)
Not saying your issue is minor, it isn't and it's freaking weird at times... Just interesting if there is a really issue being caused by this.

'rm -rf ./directory' took 2+ hours to complete. I'd be fine if the space was freeing up in the background though. I'm just wondering too if its related to why timemachine is taking forever to backup. If it takes 2+ hours to delete I can only imagine how long it might take to compare and back up the new files.

I also did a zpool upgrade and the scrub just finished without any errors detected.

It's completed the deletion now so I'll re-add and kick off a time machine backup and see how it goes. I'm hoping TrueNAS 12.0 will help.
 

ornias

Wizard
Joined
Mar 6, 2020
Messages
1,458
Ohh okey, I think both me and @joeschmuck assumed you meant files disappearing, taking 2+ hours for rm -rf to finish is NOT normal.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
I'm just wondering too if its related to why timemachine is taking forever to backup.
You still have not provided your complete hardware setup and how your drives are configured, this makes a big difference running this software, and how your system is physically connected to the computer you are backing up. For example, are you using WiFi? As surprising as it is, many people do and complain about throughput issues. You also have a very small RAM amount, while FreeNAS will run, it's not doing so efficiently. During write operations the RAM will cache the data to speed up writes, even a substandard system will benefit from a larger amount of RAM.

Ohh okey, I think both me and @joeschmuck assumed you meant files disappearing, taking 2+ hours for rm -rf to finish is NOT normal.
What I understood was the OP would execute at a terminal window/shell the command rm -rf ./directory and that the command was taking hours to complete (get back to a prompt), not that the command prompt returned and the space was not freeing up. I understood it was taking a very long time for the files to delete. But I was inferring some of that myself. The problem we cannot see is if the hard drives are very busy doing other things, for example a SCRUB started and it was good that the OP saw that and terminated it. The scrub will certainly harm the speed of remove files operation. And maybe something else was going on that we do not know. It's easy for seasoned users to look at a system and know what to look for, but new users are more apt to not know what they are looking at or looking for to help themselves out.

@cmorgan The issue you had is behind you for now. It sounds like you still have a concern with timemachine creating backups but I have a few small questions you should ask yourself:
1) Has timemachine ever given you great performance?
2) If it did, when and can you recreate it?
3) If not then I suspect your hardware/configuration is the cause of a slow NAS.

I do not use timemachine so I have no personal experience with it and hopefully you will be able to do an internet search for something like "freenas slow timemachine" and get some good return results to read.

If you decide to continue troubleshooting this, please give great detail on your system hardware and it's configuration, how it connects to your other computers, and try to put yourself in our shoes, remember that we cannot access your computer to see what is going on so you are the eyes and ears, provide as much detail as you can. The better we communicate, the faster we can figure out a good and proper solution.

I sincerely wish you the best of luck.

-Joe
 

Stilez

Guru
Joined
Apr 8, 2016
Messages
529
Deletions can be surprisingly slow, because ZFS doesn't just have to update a single table to reflect blocks not in use. It also has to update its checksum and related tree, and metadata. In other words there's a lot of small writes involved. It handles them efficiently, batching them, but small writes are always basically inefficient and slow on HDD. So a long delay to delete isn't unexpected.

The solution is almost my go-to recommendation these days. When 12 comes out, switch to it. rebuild your pool, including special vdev for metadata. Make those special vdevs, a reasonably decent SSD (you aren't using dedup so thats easy). Samsung EVO or PRO will be fine. I wouldn't use any except Samsung or Intel anyhow. Then replicate your pool to the new pool (or manually, zfs send -R ... | zfs recv ...).

The replication is because metadata isn't moved to the special vdevs, you need to rewrite the data and its easiest to replicate the pool if able. If you can't, I'm not sure how to move it. Maybe just over time, it'll move anyway. But with metadata on a special vdev (SSD vdev), and otherwise appropriate hardware and config, file deletes and other file manipulation that's slow due to HDD random 4k metadata RW, will fly.
 

JaimieV

Guru
Joined
Oct 12, 2012
Messages
742
Time machine sparsebundles have the content of the image split into bands (you can see those inside the tm.sparsebundle/bands directory). These are different sizes depending on what macOS version created them, very old ones can be 8meg so deleting a 1tb sparsebundle is 125000 files minimum and likely more as partial 8meg bands will exist. Still shouldn't take two hours, but it's not quite as insane as it sounds at first glance.

Latest .sparsebundles are built from 256meg bands and a lot easier to handle.
 

cmorgan

Cadet
Joined
Oct 29, 2019
Messages
9
You still have not provided your complete hardware setup and how your drives are configured, this makes a big difference running this software, and how your system is physically connected to the computer you are backing up. For example, are you using WiFi? As surprising as it is, many people do and complain about throughput issues. You also have a very small RAM amount, while FreeNAS will run, it's not doing so efficiently. During write operations the RAM will cache the data to speed up writes, even a substandard system will benefit from a larger amount of RAM.


What I understood was the OP would execute at a terminal window/shell the command rm -rf ./directory and that the command was taking hours to complete (get back to a prompt), not that the command prompt returned and the space was not freeing up. I understood it was taking a very long time for the files to delete. But I was inferring some of that myself. The problem we cannot see is if the hard drives are very busy doing other things, for example a SCRUB started and it was good that the OP saw that and terminated it. The scrub will certainly harm the speed of remove files operation. And maybe something else was going on that we do not know. It's easy for seasoned users to look at a system and know what to look for, but new users are more apt to not know what they are looking at or looking for to help themselves out.

@cmorgan The issue you had is behind you for now. It sounds like you still have a concern with timemachine creating backups but I have a few small questions you should ask yourself:
1) Has timemachine ever given you great performance?
2) If it did, when and can you recreate it?
3) If not then I suspect your hardware/configuration is the cause of a slow NAS.

I do not use timemachine so I have no personal experience with it and hopefully you will be able to do an internet search for something like "freenas slow timemachine" and get some good return results to read.

If you decide to continue troubleshooting this, please give great detail on your system hardware and it's configuration, how it connects to your other computers, and try to put yourself in our shoes, remember that we cannot access your computer to see what is going on so you are the eyes and ears, provide as much detail as you can. The better we communicate, the faster we can figure out a good and proper solution.

I sincerely wish you the best of luck.

-Joe

1. I thought it did. Tried another tm backup after deleting the folder and performance hasn't improved.
2. I wish... Maybe timemachine is just super slow?
3. It could be, its a several year old i3 with 8GB of ram. I ordered 16GB of ram to see if it helps.

I'm not sure I'll continue debugging but if I do I'll msg here and thank you and others so much for trying to help, I appreciate it.
 

JaimieV

Guru
Joined
Oct 12, 2012
Messages
742
Time Machine is not very quick doing initial backups, that's for sure. If you go into the new .sparsebundle you can see how many files its created and what size. A single directory with a quarter million files in is never going to be quick, even entirely cached.
 
Top