FreeNAS extremely slow

Status
Not open for further replies.

MR. T.

Explorer
Joined
Jan 17, 2016
Messages
59
TL;DR: FreeNAS is only transferring at a few kbps making replacing disks almost impossible.

Hi All,

I got freenas running on a dedicated PC (4 core avoton, 32gb ram) and after struggling with a bunch of disks that prevented the OS from booting (no idea why, i just removed them) i got it to work and was extremely happy with it.

Then disaster struck.

i have a 11 disk volume (zfs, 2 parity disks), that i have built of whatever disks i had lying around, with the intent of replacing the disks progressively till all of them were the same size, brand and model.

I was replacing two of them when 3 disks simply disappeared.
Long story short, i recovered from that but resilvering is taking about 2 months for 20% (only 4tb volume).

I have decided to take the data out of that volume into a consumer grade NAS i had lying around, but i am having speeds of about a few kbps (when it doesn't simply fail).

I have another 4 disk volume and that one's performance also seems impacted even though all disks seems great.

The only things i see are large amounts of SMART errors on the console (i expected the disks to mark bad sectors and move on, but that doesn't seem to be the case).
The SMART errors are on the disks i was trying to replace in the first place.

The CPU is hardly used hovering on 4% usage on a single core.
The RAM is used to the maximum for the ARC cache and that seems to be fine.

This is not a "production" NAS... is a home NAS and i chose freenas as it seemed to be fairly simple to use (and wasn't wrong in that assessment till now).

One thing i have noticed, that might be nothing or might be relevant:
I create volumes of full disks or collections of disks and a single dataset per volume.
The volume size and dataset size doesn't match. The volume is always several GB larger than the available size for the dataset and couldn't find out why. (wasting huge amounts of space).

Any ideas of what i can do?
I am getting fairly desperate as i have tried everything i could think of and spent the last 2 months googling trying to find anything that might help.

Thanks in advance for your time and effort.
 

depasseg

FreeNAS Replicant
Joined
Sep 16, 2014
Messages
2,874
Which version of FreeNAS?
How are your drives attached (what HBA)?
Did you burn in the drives prior to putting data on them?
What is the exact output of the smart error messages?


My quick guess is that you have 1 or more failing drives (or cables or power) in your pool, and that is degrading performance.
 

MR. T.

Explorer
Joined
Jan 17, 2016
Messages
59
Thanks for your response.

That makes sense. Is there a way to check what disk in specific is slow so i can check all the cables?

im on freenas 9.10
the disks are attached either directly to the motherboard (super micro with 12 sata ports) or through a 16 port HBA
I have 22 disks connected at the moment if that is relevant

the SMART errors are:
FAILED SMART self-check. BACK UP DATA NOW!
Failed SMART usage Attribute: 5 Reallocated_Sector_Ct.
21 Currently unreadable (pending) sectors

that (pending) never goes away... im not sure why.

I'm ashamed to admit, but i didn't even knew what burn in disks was a thing... had to google it when you mentioned.
How important is it do do? what's the best way?

I was replacing 2 bad disks (the ones chugging out those SMART errors) when one of the disks of the volume decided to drop out. It comes back but doesn't connect back to the volume... i forced it in, but about 5 minutes later it dropped out again. I retried a bunch of times and the same keeps on happening so i quit trying to get it to work.

I expected freenas to get the missing data from parity when it gets a wonky disk.

I would easily accept a slow volume with a bad disk (even though i expected to start rebuilding the data from parity and be done with it) but i expected only the data stored in that disk to be slow, and not all of it.

I was going to create a zfs1 volume, but i have been reading a lot about it and cyberjock has been adamant that it's dangerous in the past, so i have created a zfs2... And i am amazingly happy for doing so. I would have nuked the entire data otherwise.

Another thing i have noted is that the worse offending disk (da3) doesn't show the option to take offline even though all other ones do.
 

Mirfster

Doesn't know what he's talking about
Joined
Oct 2, 2015
Messages
3,215
Another thing i have noted is that the worse offending disk (da3) doesn't show the option to take offline even though all other ones do.
Not sure I would "offline" it anyways in your current situation. Instead might want to use the "Replace" option when you have a known good drive?
 

MR. T.

Explorer
Joined
Jan 17, 2016
Messages
59
I'm not 100% sure but i believe that is one of the drives being replaced.

I have a 11 disk volume (9 data, 2 parity) but freenas shows 14 disks due to all the replacing going on.
to be more exact, it shows 12 disks online, 1 disk unavailable and 1 disk faulted.

It seems that FreeNAS will keep hold of both disks on a volume when doing a replace. Not a bad thing, its just a thing.

One thing i find missing on freenas is a "test disk" button. Perhaps keep a set amount of disk space (able to be changed through a setting of some sorts) to do read/write speed tests from the GUI.
I would find that extremely useful as i would be able to diagnose whats the bad disk and go and fiddle with the cables, or simply yank the disk off.
 

Mirfster

Doesn't know what he's talking about
Joined
Oct 2, 2015
Messages
3,215
One thing i find missing on freenas is a "test disk" button. Perhaps keep a set amount of disk space (able to be changed through a setting of some sorts) to do read/write speed tests from the GUI.
I would find that extremely useful as i would be able to diagnose whats the bad disk and go and fiddle with the cables, or simply yank the disk off.
  • Do you have regularly scheduled SMART Tests (Short and Long)?
  • Do you have Scrubs Scheduled?
  • Do you have E-Mail Notifications Configured?
    • In [System] - [E-Mail] AND in [Services] - [S.M.A.R.T]
    • Do you have Temp Thresholds set in [S.M.A.R.T]?
 
Last edited:

Mirfster

Doesn't know what he's talking about
Joined
Oct 2, 2015
Messages
3,215
Appended (since the forums jacked my formatting all up)...

This it just a brief listing of the things that are within FreeNAS that will assist in preventing/notifying you regarding the health of the Drive(s) and Pool(s)...

I get the feeling that you have not really read too much on FreeNAS and could use a bit more information. Please look at the items under "Recommended Reading" in my signature. Take the time to educate yourself a bit more and you will see that it is more involved than just tossing in whatever drives you have around. Knowledge is power... ;)
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
Sounds like a disk is errororing and doesn't have TLER.

*if* you still have redundancy, and if you are sure which disk is the issue, then by removing it, you might speed things up a lot.

Can you post zpool status -v?
 

MR. T.

Explorer
Joined
Jan 17, 2016
Messages
59
I have seen your signature after the first post and did start to read through those.
Extremely useful.

I do not have scheduled SMART tests. I did not find an option for that.
I have scrubs scheduled
I've disabled emails while i am trying to work this one down, but yes, i do have emails.
I do not have temperature notifications. Is it that important? I live in a miserable chunk of land that never has high temperatures, and now has 12 degrees at most.

Unfortunately none of these is able to tell me what is the slow disk nor how fast they are performing.
Right now i am transferring at 600kbps and looking at the network section of the task manager i can see that it bursts a bit of data then is quiet for a few seconds, then bursts a bit more, then quiet, then burst.

Unfortunately right now i have no information at all about what disk is doing this. I can guess that is either da1 or da3 as both are spamming smart alerts, but have no way to be sure.
if i could figure out for sure i could yank that disk out and remove all data from the volume (its only 4tb should be fast if it's going full speed) and then dump the volume and rebuild it without the bad disks (or simply wait for the resilvering to end).

It took 2 months for the resilvering to reach 25% and i had to shutdown the NAS due to a scheduled power blackout, so i am back at 0%.
I cannot wait an entire year to get the NAS back working.

Any help on figuring out whats is causing the slowdown is immensely appreciated
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
Solnet array test, might be able to point out which disk is slowing you down.

And post the zpool status -v
 

MR. T.

Explorer
Joined
Jan 17, 2016
Messages
59
I got home and fell asleep in the sofa.... i am horribly tired.
I'll look into the solnet array test tomorrow.

The zpool status is pasted below.

Thanks you all for your time and effort so far


Code:


[root@freenas] ~# zpool status -v
  pool: freenas-boot
 state: ONLINE
  scan: scrub repaired 0 in 0h2m with 0 errors on Sat Sep  3 03:47:39 2016
config:

		NAME		STATE	 READ WRITE CKSUM
		freenas-boot  ONLINE	   0	 0	 0
		  da16p2	ONLINE	   0	 0	 0

errors: No known data errors

  pool: tmp1
 state: ONLINE
  scan: scrub repaired 0 in 1h46m with 0 errors on Sun Sep 11 01:48:56 2016
config:

		NAME										  STATE	 READ WRITE CKSUM
		tmp1										  ONLINE	   0	 0	 0
		  gptid/16d80cbd-5b43-11e6-98be-d05099c0e76d  ONLINE	   0	 0	 0

errors: No known data errors

  pool: tmp2
 state: ONLINE
  scan: scrub repaired 0 in 1h46m with 0 errors on Sun Sep 11 01:48:33 2016
config:

		NAME										  STATE	 READ WRITE CKSUM
		tmp2										  ONLINE	   0	 0	 0
		  gptid/8d5cd284-5bfb-11e6-99f3-d05099c0e76d  ONLINE	   0	 0	 0

errors: No known data errors

  pool: tmp3
 state: ONLINE
  scan: scrub repaired 0 in 1h51m with 0 errors on Sun Sep 18 01:52:02 2016
config:

		NAME										  STATE	 READ WRITE CKSUM
		tmp3										  ONLINE	   0	 0	 0
		  gptid/fd0c49ca-5c82-11e6-91b3-d05099c0e76d  ONLINE	   0	 0	 0

errors: No known data errors

  pool: tmp4
 state: ONLINE
  scan: scrub repaired 0 in 1h46m with 0 errors on Sun Sep 18 01:46:48 2016
config:

		NAME										  STATE	 READ WRITE CKSUM
		tmp4										  ONLINE	   0	 0	 0
		  gptid/6b445980-5cf7-11e6-91b3-d05099c0e76d  ONLINE	   0	 0	 0

errors: No known data errors

  pool: tmp5
 state: ONLINE
  scan: scrub repaired 0 in 1h26m with 0 errors on Sun Sep 18 01:26:55 2016
config:

		NAME										  STATE	 READ WRITE CKSUM
		tmp5										  ONLINE	   0	 0	 0
		  gptid/39369f8d-5d8a-11e6-851e-d05099c0e76d  ONLINE	   0	 0	 0

errors: No known data errors

  pool: volume1
 state: ONLINE
  scan: scrub repaired 0 in 4h53m with 0 errors on Sun Sep 11 04:56:41 2016
config:

		NAME											STATE	 READ WRITE CKSUM
		volume1										 ONLINE	   0	 0	 0
		  raidz1-0									  ONLINE	   0	 0	 0
			gptid/02689a71-52b5-11e6-a9e6-d05099c0e76d  ONLINE	   0	 0	 0
			gptid/ee4f2c9f-29b2-11e6-bd5e-d05099c0c41d  ONLINE	   0	 0	 0
			gptid/094255ec-1d39-11e6-b54d-d05099c0c41d  ONLINE	   0	 0	 0
			gptid/0a0a775f-1d39-11e6-b54d-d05099c0c41d  ONLINE	   0	 0	 0

errors: No known data errors

  pool: volume2
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
		continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Wed Oct  5 11:10:23 2016
		223G scanned out of 3.65T at 2.14M/s, 467h49m to go
		58.0G resilvered, 5.97% done
config:

		NAME											  STATE	 READ WRITE CKSUM
		volume2										   DEGRADED	 0	 0	 0
		  raidz2-0										DEGRADED	 0	 0	 0
			gptid/4fbee96c-5769-11e6-b9dd-d05099c0e76d	ONLINE	   0	 0	 0
			gptid/56cad053-5b2c-11e6-98be-d05099c0e76d	ONLINE	   0	 0	 0
			gptid/807186d6-5c82-11e6-91b3-d05099c0e76d	ONLINE	   0	 0	 0
			replacing-3								   ONLINE	   0	 0	 0
			  ada4p1									  ONLINE	   0	 0	 0  (resilvering)
			  gptid/391ec302-739a-11e6-a3d7-d05099c0e76d  ONLINE	   0	 0	 0  (resilvering)
			da3p1										 FAULTED	  0	 0	 0  too many errors
			gptid/eabc2b0b-5c2e-11e6-99f3-d05099c0e76d	ONLINE	   0	 0	 0
			replacing-6								   ONLINE	   0	 0	 0
			  gptid/685627f8-5769-11e6-b9dd-d05099c0e76d  ONLINE	   0	 0	 0
			  gptid/19b747eb-71de-11e6-a3d7-d05099c0e76d  ONLINE	   0	 0	 0  (resilvering)
			replacing-7								   ONLINE	   0	 0	 0
			  gptid/6e4550eb-5769-11e6-b9dd-d05099c0e76d  ONLINE	   0	 0	 0
			  gptid/757de200-71e7-11e6-a3d7-d05099c0e76d  ONLINE	   0	 0	 0  (resilvering)
			2602335440911976919						   UNAVAIL	  0	 0	 0  was /dev/gptid/8399de19-5a6e-11e6-98be-d05099c0e76d
			gptid/36502ab3-71d8-11e6-a3d7-d05099c0e76d	ONLINE	   0	 0	 0  (resilvering)
			gptid/7db7917b-5769-11e6-b9dd-d05099c0e76d	ONLINE	   0	 0	 0  (resilvering)

errors: No known data errors
[root@freenas] ~#

 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
I see a missing disk, and a faulted disk.

I think you have no redundancy left.

Might as well replace the missing disk.

The good news is it might speed up.
 

MR. T.

Explorer
Joined
Jan 17, 2016
Messages
59
The missing disk is actually in there,it dropped off the volume but freenas recognises it fine.
I have no idea whats going on there.

If i add it back it just drops off a bit later.
I noted that when it drops off all data disappears (description, power management, etc)

I can replace the missing disk tomorrow. Why would this have the potential to speed things up?

The faulted disk is "off"? or it keeps on working?
Basically if its faulted i lost all redundancy? or i have redundancy except on the faulted sectors?
If i lost all redundancy shouldn't i just remove that disk and be done with it?
 

MR. T.

Explorer
Joined
Jan 17, 2016
Messages
59
I might be barking at the wrong tree...

i was assuming the issue was one of these disks as that volume is degraded and was the only only i was attempting to use.
I was going to empty out another volume to have an empty disk to replace the faulted one and the transfer speed is equally bad.
So... the ENTIRE nas is painfully slow.

I don't know what to think about this... Tomorrow i'll check if there are no network issues between the NAS and the server (there is only one switch between them)
Network issues wouldn't explain why the resilvering is so painfully slow, but might as well be sure that is not the issue.

i am completely at a loss here.
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
I might be barking at the wrong tree...

i was assuming the issue was one of these disks as that volume is degraded and was the only only i was attempting to use.
I was going to empty out another volume to have an empty disk to replace the faulted one and the transfer speed is equally bad.
So... the ENTIRE nas is painfully slow.

I don't know what to think about this... Tomorrow i'll check if there are no network issues between the NAS and the server (there is only one switch between them)
Network issues wouldn't explain why the resilvering is so painfully slow, but might as well be sure that is not the issue.

i am completely at a loss here.

Check the memory graphs etc. maybe it's swapping.

I'm not sure if the faulted disk is still contributing to redundancy or not. Safest to assume it isn't ;)

But the missing disk is defininately not.

The recovery might speed up if there is a section of disk which is causing issues, but assuming it takes a week or so to finish you might as well replace the missing disk, since it's missing already.
 

MR. T.

Explorer
Joined
Jan 17, 2016
Messages
59
Thanks,

I have checked the graphs constantly, and there are no memory or cpu issues. The cpu hardly ever hits 10% on a single core (there are 4 of them) and the the ram is fine, being used almost completely for the ARC cache.

I did do a small thing that could have impacted the performance but didn't think it would be a big deal... i have disabled the swap. I have 32 gb of ram, it should be more than enough.
If swapping was the issue on reboot i would expect to have decent speeds and then see it degrade over time but is rubbish always

unfortunately there is nothing here that takes a week... we are talking of at least 1 YEAR to resilver... that's too much specially for a 4tb volume
 

MR. T.

Explorer
Joined
Jan 17, 2016
Messages
59
Quick update: it's not swapping issues. :(

I can't think of anything that might have changed between when it was transferring at 60MBps to now that transfers at 500kbps other than the attempt to swap the disks
 

MR. T.

Explorer
Joined
Jan 17, 2016
Messages
59
Ok... i just went through all the hardware and the only thing i see that might be a problem is the PSU.
It is a 550W corsair.

The motherboard and CPU are low consumption as are the rams, but i have over 20 disks connected.
All disks have a very aggressive power management profile to spin them down and power shouldn't be a problem but it's the only thing i can think of.
Is there a way to check for this (other than pulling out a bunch of disks and checking)?
 

MR. T.

Explorer
Joined
Jan 17, 2016
Messages
59
Fairly full... about 75% but i still wouldn't expect it to grind the entire box to a halt
 
Status
Not open for further replies.
Top