Suddenly very slow write speeds

Shoop83

Dabbler
Joined
May 2, 2018
Messages
35
Hi Folks

Haven't been able to find another thread discussing this exact problem.

Came back from vacation and moved some files from my Windows PC to my TrueNAS storage and the best it would give me is 355 kb/s.

This machine has been running for almost 5 years with transfer speeds I'm OK with. I updated to TrueNAS-13.0 a few months ago with no issues.

Some experimenting and I've found that there seems to be no issues copying or moving files from TrueNAS to my Windows PC. There seems to be no issues moving files around inside TrueNAS. If I try to copy a file in from TrueNAS to TrueNAS it nearly grinds to a halt.

Initially my storage utilization was at 85%, so I moved enough data off the TrueNAS to get it down to 74%. Rebooted it and waited over night. That did not seem to help anything.

RAM, CPU do not seem to be taxed during the file copy. Disc utilization doesn't seem maxed or anything.

I'm trying to figure out the right commands to run a dd test, and how to run an iperf test, but I am a slow learner and really don't want to mess anything up. I call this entire thing my house of cards. If I break it, it will take me a long time to figure out how to fix it.

That's a long way of asking, does anyone have a theory as to why my server would suddenly have horrendous write speeds? Are there any "do this first" tests I should run that might illuminate a problem?

Thank you all.

root@freenas[~]# zpool status
pool: freenas-boot
state: ONLINE
status: Some supported and requested features are not enabled on the pool.
The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
the pool may no longer be accessible by software that does not support
the features. See zpool-features(7) for details.
scan: scrub repaired 0B in 00:00:14 with 0 errors on Tue Mar 21 03:45:14 2023
config:

NAME STATE READ WRITE CKSUM
freenas-boot ONLINE 0 0 0
ada4p2 ONLINE 0 0 0

errors: No known data errors

pool: tank
state: ONLINE
status: Some supported and requested features are not enabled on the pool.
The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
the pool may no longer be accessible by software that does not support
the features. See zpool-features(7) for details.
scan: scrub repaired 0B in 12:01:37 with 0 errors on Wed Mar 22 12:01:37 2023
config:

NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
gptid/4f64359d-1206-11e9-a206-ac1f6b60c826 ONLINE 0 0 0
gptid/50f5c733-1206-11e9-a206-ac1f6b60c826 ONLINE 0 0 0
gptid/52bd6456-1206-11e9-a206-ac1f6b60c826 ONLINE 0 0 0
gptid/544a2e08-1206-11e9-a206-ac1f6b60c826 ONLINE 0 0 0

errors: No known data errors
root@freenas[~]#
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399

Your pool is too fragmented.
 

Shoop83

Dabbler
Joined
May 2, 2018
Messages
35

Your pool is too fragmented.
Thanks for responding!

root@freenas[~]# zpool get fragmentation
NAME PROPERTY VALUE SOURCE
freenas-boot fragmentation - -
tank fragmentation 16% -

It looks like the primary remedies for a too fragmented pool are...
-add more drives in a second vdev to expand the pool and increase total storage, thus increasing free space for writing
-move everything out of the pool and back into the pool, thus reducing overall fragmentation
-something else?

Thank you.
 
Last edited:

Dice

Wizard
Joined
Dec 11, 2015
Messages
1,410
tank fragmentation 16% -

I don't believe 16% fragmentation should grind the pool to virtually a standstill.

What could behave similarly is when a drive is significantly slower than others.
Ie, it might be letting go...
What's the SMART status on the drives? smartctl -a /dev/adaX

Solnetarraytest is something posted here on the forums that help speed test drives on a live pool.
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
I don't believe 16% fragmentation should grind the pool to virtually a standstill.
Well, OP also said his pool was at 85% full, but he got it to 74% full. His pool's free space is too fragmented.
It looks like the primary remedies for a too fragmented pool are...
-add more drives in a second vdev to expand the pool and increase total storage, thus increasing free space for writing
-move everything out of the pool and back into the pool, thus reducing overall fragmentation
There's also backing up the pool, destroying it, recreating it, and then restoring the contents from backup.
 

Dice

Wizard
Joined
Dec 11, 2015
Messages
1,410

Shoop83

Dabbler
Joined
May 2, 2018
Messages
35
I don't believe 16% fragmentation should grind the pool to virtually a standstill.

What could behave similarly is when a drive is significantly slower than others.
Ie, it might be letting go...
What's the SMART status on the drives? smartctl -a /dev/adaX

Solnetarraytest is something posted here on the forums that help speed test drives on a live pool.
Smart status on the drives:

Thank you for looking at this.
 
Last edited:

Dice

Wizard
Joined
Dec 11, 2015
Messages
1,410
Mart status on the drives:

Thank you for looking at this.
Nothing screams broken drives from that.
Old, but not necessarily broken.
67k power on hours!

Also, peculiar how the age timer somehow resetted here?
Code:
  1. SMART Self-test log structure revision number 1
  2. Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
  3. # 1 Short offline Completed without error 00% 1666 -
  4. # 2 Extended offline Completed without error 00% 1512 -
  5. # 3 Short offline Completed without error 00% 1331 -
  6. # 4 Short offline Completed without error 00% 995 -
  7. # 5 Extended offline Completed without error 00% 841 -
  8. # 6 Short offline Completed without error 00% 659 -
  9. # 7 Short offline Completed without error 00% 250 -
  10. # 8 Extended offline Completed without error 00% 98 -
  11. # 9 Short offline Completed without error 00% 65450 -
  12. #10 Short offline Completed without error 00% 65043 -
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
I don't believe 16% fragmentation should grind the pool to virtually a standstill.

The number doesn't represent fragmentation, it represents the difficulty the system is having finding free space -- it's almost a worthless number. And yes, high levels of fragmentation especially on a pool that has previously been nearly filled may cause huge amounts of I/O that bring the pool to a crawl.
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
This might be useful in reducing your fragmentation if you don't want to restore from backup.
arc_summary output could be interesting.
 
Last edited:

Shoop83

Dabbler
Joined
May 2, 2018
Messages
35
This might be useful in reducing your fragmentation if you don't want to restore from backup.
arc_summary output could be interesting.
arc_summary output here:

Thanks for looking at this!

I'll read through that other thread when I've got some time.
 

Shoop83

Dabbler
Joined
May 2, 2018
Messages
35
Nothing screams broken drives from that.
Old, but not necessarily broken.
67k power on hours!

Also, peculiar how the age timer somehow resetted here?
Yeah, old. Been basically constantly online since 2018.

Couldn't even begin to tell you why the age timer might have reset. Zero clue.
 

Shoop83

Dabbler
Joined
May 2, 2018
Messages
35
The number doesn't represent fragmentation, it represents the difficulty the system is having finding free space -- it's almost a worthless number. And yes, high levels of fragmentation especially on a pool that has previously been nearly filled may cause huge amounts of I/O that bring the pool to a crawl.
It's looking like excessive fragmentation is the likely culprit.

I have a SAS expansion card, and 4 new hard drives (identical to the existing ones) and a plan to expand my pool with a second vdev.

When I find the time to do that, seems like it might be best to just destroy the pool and create a new double size pool and then restore from backup. Is there a significantly better way to do this?

Thanks for looking at this.
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
Yeah, old. Been basically constantly online since 2018.

Couldn't even begin to tell you why the age timer might have reset. Zero clue.

Could be simple overflow. Looks like the max value is probably 65535.
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
arc_summary output here:

Thanks for looking at this!

I'll read through that other thread when I've got some time.
Your ARC values look normal at a first glance.
 
Joined
Jun 15, 2022
Messages
674
I have a SAS expansion card, and 4 new hard drives (identical to the existing ones) and a plan to expand my pool with a second vdev.

When I find the time to do that, seems like it might be best to just destroy the pool and create a new double size pool and then restore from backup. Is there a significantly better way to do this?
See @Davvo's post on:
Simple bash script to rebalance pool data between all mirrors when adding vdevs to a pool:
This might be useful in reducing your fragmentation if you don't want to restore from backup.
 

Shoop83

Dabbler
Joined
May 2, 2018
Messages
35
See @Davvo's post on:
Simple bash script to rebalance pool data between all mirrors when adding vdevs to a pool:
Very cool. Hadn't read that post yet. That looks a lot nicer than taxing my backup system.

Thank you for looking at this.
 

Dice

Wizard
Joined
Dec 11, 2015
Messages
1,410
The number doesn't represent fragmentation, it represents the difficulty the system is having finding free space -- it's almost a worthless number. And yes, high levels of fragmentation especially on a pool that has previously been nearly filled may cause huge amounts of I/O that bring the pool to a crawl.
In retrospect I can see how it is far more important to know the 'history of utilization' rather than current fragmentation numbers.
I got some hand's on experience on that department too.

Probably should've paid better attention to OP, had I read those parts more carefully, other clues would've emerged.

Interesting note on the fragmentation number. I can see what it <does>, also I can see how it would be less ambiguous if it was labeled differently.
 
Joined
Jun 15, 2022
Messages
674
The number doesn't represent fragmentation, it represents the difficulty the system is having finding free space -- it's almost a worthless number. And yes, high levels of fragmentation especially on a pool that has previously been nearly filled may cause huge amounts of I/O that bring the pool to a crawl.
What is a good definitive indicator of fragmentation being an issue?
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
What is a good definitive indicator of fragmentation being an issue?
I would think slow write speeds and possibly (though harder to notice thanks to ARC) slower reads too if we talk about extreme situations.
Anyway, imho keeping that number low on spinners can't be a bad thing.

I wouldn't guarantee fragmentation being the issue here though.
If defrag doesn't solve the issue, please post the output of jgreco's sonlnet array.
 
Last edited:
Top