Hi! ...help! [Slow IO || Idle process threads]

AenAllAin · Oct 19, 2018

Hi e1, I have been using FreeNAS a while and finally joined the forums so I could complain. I find this free software does not live up to expectations, and as a millennial I expect, Nay!, demand better free stuff ...I am entitled to free stuff! I deserve everything for free!

Ok, ok, just kidding. Thank you for your hard work, and for making your product freely available; it has been a big help even though it isn't perfect, and I am grateful to all who have contributed. I have an odd problem with processes idling that I cannot seem to find an answer for. Actually, I have not even been able to identify what exactly the cause of the problem is, just a set of symptoms; which include: slow IO, crashing server, disconnects, data corruption, and some other misc. stuff.

First, my specs:
OS: FreeNAS-11.1-U5
Server: HP ProLiant DL380 G5
CPU: Intel(R) Xeon(R) CPU X5450 @ 3.00GHz
RAM: 32728MB
Peripherals:
(1) NetApp 111-00341+B0 X2065A-R6 4-Port Copper 3/6GB PCI-E Host Bus Adapter
(2) LSI Logic MegaRAID SAS 9285CV-8e Storage Controller LSI00298 (not in use)
DAS: DS2246 NETAPP 24-BAY STORAGE ARRAY
Drives:
1TB HGST Travelstar (0S03563)[7200,32MB,6Gb/s,2.5"] x 5
1TB HGST Travelstar (0J22423)[7200,32MB,6Gb/s,2.5"] x 5
2TB WD Blue (WD20SPZX)[5400,6Gb/s 128MB,2.5"] x 5

1) Turns out part of the cause was the lagg interface. The server has two 1Gbps ports, so I thought it would be nice to utilize both. I was able to successfully combine both 1Gbps links into a single 10Mbps aggregate. After dismantling that some of the slowness cleared up.

2) Some of the instability cleared up with the 11.1 upgrade, and another part when I replaced the HBA card that was struck by lightning.

3) The data corruption was probably caused by the instability which turned my SSD L2ARC into an unidentified root level device on the zpool; it wouldn't have been as critical an issue if I had not foolishly overestimated the ZFS filesystem and added all of the drives to the same zpool (in separate raided vdevs). I was able to prevent any significant data loss by immediately blocking writes to it, rebooting (which caused it to roll back to the last stable point), and migrating the data.

4) There is not really anything I can do to fix the fragile nature of the ZFS filesystem caused by its projecting raid level limitations up to the filesystem layer, and compounding it with the inability to remove from the zpool. However, I can at least mitigate it by keeping the VDEV-to-zpool at a 1-to-1 ratio (never adding more then one raid set to a zpool).

5) The IO slowness really became obvious when I connected an external USB drive to the server in order to backup the data. I cannot bring the corrupting zpool online as shared which is why I decided to try using an external USB drive. I successfully mounted the USB drive as ntfs-3g (eventually), but when I tried to transfer the data it performed astonishingly slow. After the 6th day of the attempted 700GB transfer, I decided to abort it and try it from a different angle.

6) I reformatted the USB drive as ext4, then ext3, and finally ext2 (because ext4 and ext3 would not mount), mounted it and attempted the transfer again. I had previously thought my NetApp HBA+DAS was responsible for the IO slowness, but after monitoring the ntfs-3g and then the ext2 transfer using "iostat -x", I noticed that the DAS and HBA were working fine. The transfer in both cases was occurring halting bursts: (i) huge parallel reads from zpool drives, (ii) sudden huge write to the USB HDD, (iii) followed by the USB HDD write tapering off to 0Bps, (iv) then a prolonged period of nothing other than sporadic writes to the System Dataset; these periods of IO-lessness lengthened as the data transfer neared completion. Even so the 700GB transfer took about one day for the ext2.

7) After getting the backed-up data transferred to a new un-corrupted zpool, and bringing that online, I began a data verification and another file-sync over the network. This is when the last form of the IO issue presented clearly. Every atomic data transfer request that took less than 5-10 minutes always completes without any issue. Every transfer that took some extended period of time ran into issues. Those extended data transfers that I had done before dealing with the stability issues always froze and eventually failed. Those extended data transfers that I now perform after fixing the stability issue do not fail, but they all go into a server initiated sleep/hibernate; they only recover when I make a new data request to the server.

8) I thought I had found the issue when I checked the drive idle stats, and found that two of my 1TB drives were boned (way over the spin-up estimate). But after monitoring, I realized it was an old issue and none of the drives seemed to currently be effected by it (although most of them did show a recently high level of spin-up). None of them were set up for standby idle, or power-saving, but just in case I went through and explicitly turned off standby and power-save for all of them again from the command shell.

9) The slowness behavior is still happening, but I think it has nothing to do with the IO. I think something in the OS is throttling or sleeping the thread(s) that perform the transfer whenever they have to wait for IO. I think this is what then allows the drives to reach a point where they would spin down.

At this point, I'm not sure what to try next. Any help would be appreciated, thanks.

Apollo · Oct 19, 2018

Forget about transferring over USB and especially over non ZFS system.
I would recommend to backup using replication instead. Preferably over the network to another server. This will be less prone to user errors.
Otherwise you can add another volume to you local system and perform replication over SATA.
IO can be drastickly slow when replicating 100k's thousand snapshots and it is expected due to snapshot overhead.
I recommand you do replication via CLI and use the -vv on send and receive in order to monitor progress.
Yi can also use Netdata to look at the real time progress and metrics.
The benefit of this process is that ZFS is going to validate each block being accessed are without any corruptions.
When replication is complete and snapshots present of the nealy replicated data, this means your data has been replicated reliably. No need for non-sense verification after the fact. Everything is taken into account by ZFS.

With Freenas 11, it seems CPU and memory requirement are giving older hardware a hard time.
My Xeon E3 couldn't cope with 1Gb/s throughput over ssh. It was maxed out to 600Mb/s.
So the system you used to run older version of Freenas where handling things better than they are now and may not be an indication of hardware failure.
Iocage is also having lots of issues on the network.

In a nutshell, if you can run scrub on the volume and everything is fine, then you should have much to worry about.

AenAllAin · Oct 20, 2018

Apollo said:
Forget about transferring over USB and especially over non ZFS system.

Yeah, that is what I ended up doing. After I copied the portion of data that did not have a backup, stopped using the USB. Instead, I added some more drives and put them in a new zpool which I then replicated to. This is when something else weird happened. After creating the new zpool, I exported it so the GUI could import it, but the GUI kept failing to import and sometimes could not see the new zpool. When I checked from the shell, the individual drives in the zpools vdev were taking turns spinning down and becoming unavailable. The zpool was not corrupted, but the only way to import it was to repeatedly issue the import command until it hit a window when they all happened to be active at the same time. When I finally tried rebooting, the zpool/drives seemed to stabilize.

I only did the manual verify to be sure of no corruption; it was all good. The ZFS kept all of the existing data pristinely un-corrupted even though the configuration state went bad.

Apollo said:
With Freenas 11, it seems CPU and memory requirement are giving older hardware a hard time.
My Xeon E3 couldn't cope with 1Gb/s throughput over ssh. It was maxed out to 600Mb/s.
So the system you used to run older version of Freenas where handling things better than they are now and may not be an indication of hardware failure.

I don't think the older version of Freenas handled it any better. I think it just made it seem like a hardware or HBA driver issue. I could probably live with a consistent throughput, but the process-sleep is making it unusable for large amounts of data. It may be resource related, because when in transit the memory utilization is maxed (ref screenshot). However, when it goes into sleep/idle the memory usage drops and most of it goes to Inactive.

Short activities seem to work fine mostly; its just any long-running process that gets put to sleep. If I do the data copy in smaller sets like copying one folder at a time it works fine. Also, if I don't make a data request for a while something in the OS goes to sleep as well, and it has to pause for a bit as it wakes up for a any initial new request. I swear, the drives are all set to "always-on". The CPU utilization rarely passes 50%; usually only when first responding to a request.

One thing I will note that might be significant is that on boot the OS reports that the "smartd" failed to start. However, it seems to be started later on after finished booting. It could still be a driver/hardware compatibility issue; maybe the set of working hardware features does not include SMART, HDD standby, and the Advanced Power Management. But, even so, I think that would just leave the drives in the default manufacturer settings which would not cause this. Also, I have seen that the DAS to server transfer can hit the full 6GBps of the SAS IO controllers; it just doesn't stay there.

Apollo · Oct 20, 2018

Doing replication will cause the data to be stored in RAM and you will see max out. At that point, the memory will remain maxed out and some of the swap could start being used. Sometime, when you perform certain ZFS operation, the RAM used will drop significantly.
When it comes to importing a volume, there is no need to send the request more than once.
The more snapshot your volume holds, the longer it will take to obtain it's content.
As soon as the request has been initiated, you can check the status of the pool using the command line, such as zpool status.
Do that a few times and observe the pool status.

Are you using by any chance an encrypted volume?
If your volume is not attaching one of the spare, then one of the import Python process used to import a new volume will fail.
It will also prevent you from seeing drives you want to import.
If you have an encrypted volume which is in a Locked state because the drives are not on the system (I do that when performing rotational backups on encrypted volume, one volume is taken out when system is off and the other one is plugged into, without the need to go through the lengthy process of attaching and detaching encrypted volume) or simply remain locked because the passphrase has not been entered after power up, then every single drive and volume not yet imported will not be visible in the list of disk or volume to create or import.
To fix this, you need to detach all the missing volumes.

In your case, if I understand, and I have some trouble to, is that upon initiating the import command, the Volume state become "DEGRADED" because some disk become "UNAVAILABLE"?
Is that the case?
When you say "Verify" are you talking about running "Scrub"?

Also, what makes you think the drives are spinning down?

AenAllAin · Oct 20, 2018

Apollo said:
The more snapshot your volume holds, the longer it will take to obtain it's content. As soon as the request has been initiated, you can check the status of the pool using the command line, such as zpool status. Do that a few times and observe the pool status.

I only have 3 snapshots on the dataset.

Apollo said:
Are you using by any chance an encrypted volume?

No encryption; I decided not to encrypt because I wanted auto-mount and if I need some data encrypted, I can just do it at the file level.

Apollo said:
In your case, if I understand, and I have some trouble to, is that upon initiating the import command, the Volume state become "DEGRADED" because some disk become "UNAVAILABLE"?
Is that the case?
When you say "Verify" are you talking about running "Scrub"?

So that stopped after a reboot, but yeah, the new zpool's vdev had 5 drives. I exported it after creating the zpool from the shell, so that the GUI could import and list it. After exporting it though, the zpool was DEGRADED because one or more drives were not found. The drive(s) missing was constantly changing. The zpool import command would show a status from non-existent, to missing various drives, to finding all drives and ready to import. I'm not kidding, I imported it again by repeatedly executing zpool import xpool over and over until I eventually hit a window where all the drives were online at the same time.

Apollo said:
Also, what makes you think the drives are spinning down?

The Start_Stop_Count; if I understand what it means correctly, several of my fairly new drives have been worn out pretty fast.

Code:

Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME		  FLAG	 VALUE WORST THRESH TYPE	  UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate	 0x000b   100   100   062	Pre-fail  Always	   -	   0
  2 Throughput_Performance  0x0005   100   100   040	Pre-fail  Offline	  -	   0
  3 Spin_Up_Time			0x0007   132   132   033	Pre-fail  Always	   -	   2
  4 Start_Stop_Count		0x0012   001   001   000	Old_age   Always	   -	   281844
  5 Reallocated_Sector_Ct   0x0033   100   100   005	Pre-fail  Always	   -	   0
  7 Seek_Error_Rate		 0x000b   100   100   067	Pre-fail  Always	   -	   0
  8 Seek_Time_Performance   0x0005   100   100   040	Pre-fail  Offline	  -	   0
  9 Power_On_Hours		  0x0012   090   090   000	Old_age   Always	   -	   4663
 10 Spin_Retry_Count		0x0013   100   100   060	Pre-fail  Always	   -	   0
 12 Power_Cycle_Count	   0x0032   100   100   000	Old_age   Always	   -	   54
191 G-Sense_Error_Rate	  0x000a   100   100   000	Old_age   Always	   -	   0
192 Power-Off_Retract_Count 0x0032   100   100   000	Old_age   Always	   -	   20
193 Load_Cycle_Count		0x0012   071   071   000	Old_age   Always	   -	   291260
194 Temperature_Celsius	 0x0002   253   253   000	Old_age   Always	   -	   23 (Min/Max 7/29)
196 Reallocated_Event_Count 0x0032   100   100   000	Old_age   Always	   -	   0
197 Current_Pending_Sector  0x0022   100   100   000	Old_age   Always	   -	   0
198 Offline_Uncorrectable   0x0008   100   100   000	Old_age   Offline	  -	   0
199 UDMA_CRC_Error_Count	0x000a   200   200   000	Old_age   Always	   -	   0
223 Load_Retry_Count		0x000a   100   100   000	Old_age   Always	   -	   0

I'm still trying to pin it down, but it looks like there is a queue delay on the servers NIC for the upstream (send-to-client) connection. I think by itself that is not significant, but combined with the packet window being 128, and the router and client being clear, I think that it is just being throttled badly for some reason. I'm still trying to clarify what the 128 means: MB? mbufs? packets? segments? and is that low or standard? Does it mean the total left available, or is it the total that can be used at any given instant? And is the window:latency ratio causing a bottleneck when trying to move Tera-Bytes of data?

Also, there is this: FreeBSD SegmentSmack (https://www.freebsd.org/security/advisories/FreeBSD-SA-18:08.tcp.asc). It may just be coincidence, but the behavior I am seeing in my server is the same as that produced by a DOS exploit of this vulnerability; and, I haven't applied the last FreeNAS update yet, which may have something for this.

Apollo · Oct 20, 2018

Maybe your drive have been forced to park head too often. There isn't much in term of power cycles, though.
Maybe it would be wise to replace one of the faulty drive with a brand new one and see how things evolve.
The question is whether or not the behavior is expected or a glitch or some misconfiguration is actually causing a premature death.
About DOS, I don't think this would apply to ZFS.
Drive showing less than a year worth of life expectancy is not good at all.

garm · Oct 21, 2018

This is not a FreeNAS issue as much as it’s a systems issue. You have the wrong drives for the job and you comments about hibernation is troubling. The Travelstar and WD Blue is “eco” drives built to save power by parking the head at every opportunity. This will quickly kill a NAS disk being part of an RAID array or ZFS pool.

ZFS will kick out a drive if it’s unresponsive and your inability to import the pool you made by hand is a good example of the issue.

Also, complaining about the “fragility” of ZFS is indicating a lack of understanding of what you are dealing with.

There are threads on the forum about reconfiguring “eco” drives, those are worth looking into.

AenAllAin · Oct 21, 2018

garm said:
...

1) Well, if you don't want to help that is ok. I'm not demanding it, just asking and also giving some perspective feedback. No need to get defensive.

2) The extent of the software's hardware support is totally up to the FreeNAS Team/Community. You can blame the hardware all you want, but at the end of the day "not supported" is still "not supported", and a NAS solution that doesn't support commonly used drives is really limiting it's user base. Besides, there is no indication that it is actually the drive hardware at this point.

3) I can see your point of view on ZFS Fragility; from within the developer's bubble the filesystem has come a long way, and it does an excellent job at what it is good at (it did preserve the last stable state). But surely you can also see the perspective from the other side; the addition of one drive irreparably ruining the state of the entire filesystem is kind of the definition of "Fragile". I totally understand why; it is like I said, the RAID limitations (i.e. immutable once created) is a natural artifact of the RAID algorithm and must exist at the lowest level; and, the immutability (i.e. fragility) of the top level filesystem is the result of projecting the RAID limitations upwards.

I'm hoping that by solving this issue with everyone's help it might serve to expand the availability of a nice product to more users. In any case, thanks for your input.

garm · Oct 21, 2018

1) never mind then
2) no, OpenZFS implementation is done in FreeBSD, not FreeNAS.
3) I disagree, what you call weakness is strength. ZFS is filesystem done right.

AenAllAin · Oct 21, 2018

Okay, I believe I have resolved most or all of the issue. I used the settings from the guide here:

"https://calomel.org/freebsd_network_tuning.html".

After applying these settings I saw a major improvement in throughput. The transfer ETA dropped from days to hours instead. I feel like the speed could be a bit better still, but this is usable.

From this it seems to have been an out-going bottleneck of some sort, but I do not know yet which of these settings made the difference; so I am still going to tinker with it more after I get my data back to normal.

Thanks everyone for the help and feedback!

kdragon75 · Oct 22, 2018

AenAllAin said:
The extent of the software's hardware support is totally up to the FreeNAS Team/Community.

FreeNAS is built on FreeBSD. Its up to the FreeBSD devs to "support" any hardware. Also this is not a consumer oriented system, its enterprise oriented. This means enterprise hardware. Just because you CAN use desktop class hardware does not mean that your should.

AenAllAin said:
Besides, there is no indication that it is actually the drive hardware at this point.

Aside from the insane number of head parks, no. Your right. But thats ignoring a big piece of information.

AenAllAin said:
the addition of one drive irreparably ruining the state of the entire filesystem is kind of the definition of "Fragile".

depending on how that drive is added, this would be expected. You need to do more reading on how ZFS works. It's incredibly flexible and only as robust as its implementation. ZFS is not build to hold a noobs hand and keep you safe from yourself. Yeah that's a jab at millennials. In fact a great deal of enterprise equipment/software is built this way.

AenAllAin said:
The data corruption was probably caused by the instability which turned my SSD L2ARC into an unidentified root level device on the zpool

Do you have evidence to support this?
Both of these statements include and astonishing lack of anything technical. At any rate, you have not used proper disks and we do not know the condition or configuration of your hardware or software. This make its difficult to for us to form a hypothesis.

At the end of the day, the system will only be as reliable as the hardware and configuration you applied.

garm · Oct 22, 2018

AenAllAin said:
3) The data corruption was probably caused by the instability which turned my SSD L2ARC into an unidentified root level device on the zpool; it wouldn't have been as critical an issue if I had not foolishly overestimated the ZFS filesystem and added all of the drives to the same zpool (in separate raided vdevs).

This simply dosent happen on its own. Either you messed around with your pool or you need to file a bug report ASAP. Drives don’t just jump around in pools for no reason.

AenAllAin · Oct 22, 2018

Funny, the issue is solved, but you guys are still talking, and the "gurus/experts" did not contribute anything useful. Sorry, I can fix computer problems, but I can't fix the delusions of people who think their software is perfect, and cannot stand anyone criticizing their baby.

"this is not a consumer oriented system, its enterprise oriented." ...maybe you guys should get with your people and delete the For Home section of your site ("http://www.freenas.org/for-home/"), and stop marketing to home users.

This just sounds like two trolls with swollen egos who are butt-hurt over being wrong about it being the hardware ...I don't have time for this nonsense.

garm · Oct 22, 2018

I have no issues with criticism, if you find me a serious fault with ZFS or FreeNAS I want to know because I trust both with some seriously important data. That’s why a asked for a bug report...

Just because there is a “home” version of TrueNAS doesn’t mean you can reliably run it on any hardware. That’s why there is such a focus on hardware on both the forum and the official documentation.

Now to your solution. The files in that guide cannot be tampered with just like that in FreeNAS. I don’t know how you did it but for future readers you need to be 1) careful what you do change and 2) do it using the GUI, otherwise it will not survive a reboot/upgrade. Looking through the zfs. parameters the blog suggests changing I can’t quite see what did the trick as most of them will increase the time between writes to the disks and that wouldn’t make your park issue go away. But if your throughout is solved Hurray, just keep an eye on your S.M.A.R.T reports. As a comparison, my 4 TB IronWolfs are about the same age in hours as your drives and they have parked their heads ~6 times. My 1 TB WD Green (that where configured to stop their "eco" nonsens) was replaced at around 33k hours and they racked up parking counts of around 20k. Your drives have achieved 300 000 head parks in 5000 hours. Do you really believe they will survive 330 Million? I donno, I have never experienced that. But I wouldn't take the risk. Pretty please come back arount 30k hours and post a S.M.A.R.T report :)

I do sens that you aren’t that willing to help out others so I guess we just leave the thread with a partial Congrats! that you increased your throughout and with the hope that you now have a configuration that will not make pool import, data read/writes and disk health an issue for you any more. I guess the lesson here is that if you want a "eco" disk in your NAS you will need to tune it so that the drives won’t kill them selfs. Personally I prefer drives that are meant for 24/7 operation, it’s just easier.

AenAllAin · Oct 23, 2018

Making changes permanent was already here in the forums (thanks to Durkatlon):
https://forums.freenas.org/index.ph...dit-retain-rc-conf-sysctl-conf-settings.1426/

Code:

# mount -uw /
# vi /conf/base/etc/rc.conf

...if I had proof of what caused it I would post it. The only suspicion I have is that the smartd was unable to report the real settings, and the GUI was left to assume and just report standby and power save were off when in reality they were in default settings the whole time; which just leaves me to watch and hope that by forcibly setting it off from the shell has fixed that issue going forward.

Important Announcement for the TrueNAS Community.

Hi! ...help! [Slow IO || Idle process threads]

AenAllAin

Cadet

Apollo

Wizard

AenAllAin

Cadet

Attachments

Apollo

Wizard

AenAllAin

Cadet

Apollo

Wizard

garm

Wizard

AenAllAin

Cadet

garm

Wizard

AenAllAin

Cadet

kdragon75

Wizard

garm

Wizard

AenAllAin

Cadet

garm

Wizard

AenAllAin

Cadet

Similar threads

Important Announcement for the TrueNAS Community.

Hi! ...help! [Slow IO || Idle process threads]

Cadet

Wizard

Cadet

Attachments

Wizard

Cadet

Wizard

Wizard

Cadet

Wizard

Cadet

Wizard

Wizard

Cadet

Wizard

Cadet

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Hi! ...help! [Slow IO || Idle process threads]"

Similar threads