New TrueNAS User - 1 Issue

AntoninKyrene

Dabbler
Joined
Jan 2, 2023
Messages
15
Hello!

I successfully deployed my first TrueNAS CORE unit two weeks ago after months of research and tinkering. So far I am a very satisfied user.

BACKGROUND
I found myself needing to migrate (on somewhat short notice) from a Synology DS1817+ deployed almost exactly 5 years ago. My current TrueNAS setup is 16 drives totaling 60TB Raw / 40.5 Storage across a pair of IBM M5110 controllers. I had a spare AMD Ryzen PRO 4750G processor needing a home and paired it with 32GB of ECC DDR4 memory. The TrueNAS unit is built around 3 storage pools. I started with a single controller and 4 disks, imported a second pool of 4 disks on the same controller, and finally imported 8 disks on a second controller. The first 4 disks are new WD Red Plus 6TB drives. The second 4 disks are WD Red 3TB drives from the original Synology setup (these were used as hot backups connected to the Synology via USB, courtesy of two WD MyBook Duo enclosures). The 8 remaining disks are the WD Red 3TB drives from the original Synology itself. All of this data was backed up to 4 Seagate external HDDs.

WORKFLOW
The TrueNAS is connected to an Intel NUC running Windows 10 Pro via Ethernet, but through a hub rather than a switch or router, a.k.a. no gateway. All data is being imported through the NUC. Data from the Seagate HDDs to the NUC SSD, and then from the NUC SSD to TrueNAS. Speeds between NUC and NAS are rock-solid and at the upper end of the Gigabit limit: 110MB/s to 112 MB/s. Once within the NAS, data is being sorted and repositioned. Movements from NUC to NAS are exclusive of data movements between pools. If one process is moving data, the other is not. Movement from pool to pool reaches speeds in excess of 400MB/s, but the average seems to be about 290 MB/s.

I have a single issue. It is not a showstopper, as I have moved well over 20TB of data so far and continue to move more, but I am stumped as to what could be causing it.

ISSUE
Every 10-12 minutes, when going from NUC to NAS, the transfer speed across the network drops to exactly 5.54 MB/s and then bounces between that number and zero. This bouncing takes places for anywhere from a few seconds to a couple of minutes, and then full-speed transfers resume for another 10-12 minutes. I assumed it was a network issue. No - the exact same phenomenon is also taking place pool-to-pool, whether it be within one controller or across both.

TrueNAS is showing no errors whatsoever at any level of operation. CPU usage across all 16 cores never exceeds 15% and temperatures are under 50 Celsius. I see no SMART errors or anything disk-related that would indicate disk health is problematic.

The only idea I could come up with is pool saturation. Currently ZFS1 and ZFS2 are over 90% - they absorbed the critical data before the 8-disk pool (ZFS3) came online, and they are now relieving themselves of their over-80% status by moving data to ZFS3.

Any thoughts on what might be causing this issue? I did not find anything after considerable searching, but I am not the world's best searcher, either.

-Antonin
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Yeah, you're crushing the bejeezus out of the ARC, knocking metadata out of memory.

You have 12TB of space in pool 1, 24TB in pool 2, and 24 in pool 3. Especially considering that these are separate pools, you should be obeying the 1GB ARC per TB of disk rule, so that the ARC can cache sufficient amounts of metadata for your pools. You are stressing out the system both by continuing to write to pools after the 80% mark (a gradual degradation in performance), and also because you are moving data between pools, this forces one pool to be allocating space (needing to search for free space) while forcing another pool to free space (as those blocks are removed). You really probably want 64GB+ of ARC, and to maintain utilization of less than maybe 70% on your pools. That's a semi-arbitrary number, mostly "just a lot better than 90%".
 

AntoninKyrene

Dabbler
Joined
Jan 2, 2023
Messages
15
Thank you. I do appreciate the diagnosis. Adding 32GB now is not an issue.

Is this an issue because of all the data writing during the initial stages of this project - the migration - or will this continue to be an issue as the OS maintains the data?

Daily write traffic on this NAS will be less than 10GB once it's fully populated, with daily read traffic of 100GB. These are single file movements: large HD files being moved to a workstation for transcoding, and then being returned to the NAS for storage. New content for transcoding will be uploaded each fortnight - this is where the problem is likely to resurface...

-Antonin
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
As fragmentation increases, the system is more likely to struggle to be able to write new data (because it needs to search more metadata to find free space), so I would expect that there's a good possibility that this will get progressively slower and worse over time.
 

AntoninKyrene

Dabbler
Joined
Jan 2, 2023
Messages
15
OK!

Well, this was TrueNAS v1, and it was meant to be an experiment and a learning experience. For sure, it works, it works well, and it will do the job as envisioned. It got rid of A LOT of extra hardware and greatly simplified the entire home datacenter, and it will be completely usable for several months without issue. A wonderful product, and what appears to be an equally wonderful community.

But it looks like the planning for TrueNAS v2 starts now. And with this Sith Lord preferring many smaller drives over a few larger ones, this is going to be fun, because there is nothing in traditional computing that sizes larger than the Define 7 XL that I could find, so it's time for server racks. I foresee growth to 200TB and beyond, and in doing some simple searching within the community, I see this is a completely realistic DIY. Just A LOT of hard drives involved.
 

ChrisRJ

Wizard
Joined
Oct 23, 2020
Messages
1,919
This board has a RealTek NIC, which may contribute to the situation.
The TrueNAS is connected to an Intel NUC running Windows 10 Pro via Ethernet, but through a hub rather than a switch or router, a.k.a. no gateway.
What exactly is a hub in this context?
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
I see this is a completely realistic DIY. Just A LOT of hard drives involved.

Yes, it's not really that hard. Most of the hard work has already been done for you, and there's lots of talk about how to do this properly.
 

AntoninKyrene

Dabbler
Joined
Jan 2, 2023
Messages
15
@ChrisRJ
I had planned to upgrade to an internal PCIe NIC once I had some semblance of order and I could confirm the remaining PCIe slot was actually available, which it is. Now that more RAM is on the way, the NIC is next. And yes, I do agree recycling is not optimal for this application. Given the price differences involved between v1 and what will be v2, I knew I needed to practice and learn first before I contacted the family CFO for permission to go beast mode on v2. That is why I'm using hardware better suited for gaming rather than server applications. When I started looking forward to what v2 would cost...yeah, I needed to make sure this was the solution.

The 'hub' is a D-Link DGS-108. Very basic Gigabit ethernet box, which at that moment was the answer to the problem of, "where is my crossover cable?"

UPDATE:
All 3 pools are now below 66%. There has only been one bounce, and it lasted for less than a few seconds. Pool-to-pool is holding higher speeds almost continuously now. As I restart actual production work...I'm liking this new setup a lot.

-Antonin
 

ChrisRJ

Wizard
Joined
Oct 23, 2020
Messages
1,919

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
The 'hub' is a D-Link DGS-108. Very basic Gigabit ethernet box, which at that moment was the answer to the problem of, "where is my crossover cable?"

You do not want or need crossover cables with gigabit; using one will force a link back down to 100Mbps. The 1GbE specification includes auto-MDI/MDIX as a required capability, so any two gigabit devices should always negotiate a proper 1000/full link when connected to each other.
 

AntoninKyrene

Dabbler
Joined
Jan 2, 2023
Messages
15
@jgreco
I did not know that. Or did I? Unknown - it seems the older I get the less I remember about a world I have not actively worked in for almost 20 years now. I wish I could say this v1 project is bringing back memories, but I have not been cutting edge since the PowerPC 970 was hot. Those were my glory years.

@ChrisRJ
That would be a very bad habit of mine, but you are correct, it is a switch. For reasons unknown - language apathy, indifference to terminology, owner of the XY chromosome, native New Mexican - I always think of an unmanaged switch as a hub. If I cannot login to it and control it, it's dumb, and hubs are dumb.


Well I am happy to report it is all done. The second half migration was certainly much smoother than the first. We are holding at 75%± on all three pools. Being someone who really is not a person you would classify as 'network savvy' (and this thread proves it) I am pleased with the end result. Too many contributors whose snippets of knowledge gleaned from this site to thank, but it all helped. And 16 drives in a Define 7 XL looks really cool, too.

-Antonin
 

Whattteva

Wizard
Joined
Mar 5, 2013
Messages
1,824
I always think of an unmanaged switch as a hub. If I cannot login to it and control it, it's dumb, and hubs are dumb.
A switch, even unmanaged one, is infinitely better than a hub. A hub would just broadcast all traffic to all ports regardless of destination resulting in massive packet collisions everywhere, especially in a busy network. A switch is much smarter and knows which port a packet is destined to if it isn't a broadcast traffic.
 

AntoninKyrene

Dabbler
Joined
Jan 2, 2023
Messages
15
When I queue for St. Peter, you will never hear me say, "Networking was one of my better skills." :smile:

I think this thread proves two points: I was crushing the bejeezus out of the ARC, and I need to invest time and some resources into getting back up to speed with modern networks.

I do have two additional questions. I said last night a total migration would be about 200TB. I should not do math after 7PM - the correct total is 100TB. Is that considered within the scope of what CORE can realistically support? At the most basic design level, that is 16U if I invest in 6TB x 12 bays x 4 4U enclosures., That translates into 194TB of usable data storage using RAIDZ3, with 100TB being 52% saturation, leaving some room for growth.

We just deployed a host of 4U 12-disk enclosures at work, so I can see this in my mind's eye. How they were actually configured I do not know. They look very similar to the Rosewill RSV-L4412U.

-Antonin
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
If I cannot login to it and control it, it's dumb, and hubs are dumb.

This is true. However, you have made a classic logic error in that you have assumed that anything you cannot login to is therefore a hub. However, a hub is a very specific sort of device. It dates from the days of 10Mbps ethernet (and there were a small number of 100Mbps ones). Back in the day, we had "10base5" ethernet which was a coaxial cable arrangement with terminators on both ends, AUI drops and transceivers, and the "network" had to negotiate access on a CSMA/CD basis (the ethernet preamble, etc). This was impractical in many ways and was replaced with 10base2 and then 10baseT/UTP. The RJ45 cable you're familiar with replaced the AUI cable connection, and the 10Mbps hub replaced the transceiver plus coax cable plus terminators. The transceiver plus coax cable plus terminators had no intelligence, and merely parroted the same information to each station on the network. That's a hub. A switch is a more intelligent device, even if you cannot log into it, and has logic to optimize the egress of packets from ports to limit it to only sending out necessary traffic on each port, similar to a DEC DEMPR.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
I do have two additional questions. I said last night a total migration would be about 200TB. I should not do math after 7PM - the correct total is 100TB. Is that considered within the scope of what CORE can realistically support? At the most basic design level, that is 16U if I invest in 6TB x 12 bays x 4 4U enclosures.,

iXsystems indicates that they are able to support 20 petabytes on a single TrueNAS Core head unit.

This makes sense to me; a high density storage chassis like the Supermicro 6049P can handle 60 drives in 4U with a head unit, or 90 drives without. Sixty drives times 22TB is 1.3PB. Ninety times 22TB is 1.98PB. If we assume a 44U rack, that would fit one head unit plus ten JBOD units for 21.10PB. I have to figure their design is similar to that.

The main concern is proper cooling, which is sometimes difficult if people use random chassis. I actually don't like the 24-drive-in-4U chassis because of the cooling tradeoffs involved, but they are relatively practical to work with.

Lots of people here would consider 200TB to be a small-ish array, even if made out of mere 6TB HDD's.
 

ChrisRJ

Wizard
Joined
Oct 23, 2020
Messages
1,919
@ChrisRJ
[..] language apathy, indifference to terminology [..]
My parents are a German teacher and the daughter of a lawyer. So those were no options for me, ever :smile:. I realize that I sometimes come across as pedantic w.r.t. language. But on the other hand I have seen too many project issues, because people were sloppy about terminology.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
I realize that I sometimes come across as pedantic w.r.t. language. But on the other hand I have seen too many project issues, because people were sloppy about terminology.

That's hardly a new issue. I doubt a day goes by where I have to make a call as to what someone meant by some randomly chosen imprecise abbreviation. That was the genesis of the Terminology and Abbreviations Primer.
 

AntoninKyrene

Dabbler
Joined
Jan 2, 2023
Messages
15
Trust me, I do not mind being corrected or educated.

I should know better. I work in law. Imprecise communication is something I deal with hourly. You speak of sloppy use of terminology? Trust me, I get it. We don't have project issues, though. When our use of language errors, we have unfortunate situations to fix, like constitutional violations, or public safety consequences. Years ago, we had an appellate case decided based upon the placement of a single comma in a single sentence.


10Base2 was the first real network I remember working with, circa 1990, and only because that was how our AS/400 system was interconnected. After that, it was years before I worked in network space again - we had people for that task. I did not "deploy" my first home network until 2012, and that evolution consisted of plugging two computers into the Ethernet ports on the DSL modem/router. And my first computer was a brand-new Apple IIc in 1984, so I am not just a latecomer to the network party, I was absent from it for decades.

In many ways, those Synology units were a giant step forward.

-Antonin
 

ChrisRJ

Wizard
Joined
Oct 23, 2020
Messages
1,919
Years ago, we had an appellate case decided based upon the placement of a single comma in a single sentence.
Back in 1998 when working in the UK, and my English was not where it is today, I had rented a flat via an agency. It turned out that their template for the contract also had a comma in the wrong place. So when I got a much better offer for accommodation, I could end the contract with 2 weeks notice, rather than 3 months.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
the correct total is 100TB. Is that considered within the scope of what CORE can realistically support?
Easily; my current pool is ~120TB (and was under CORE as well before I moved to SCALE). With current disk sizes, you can easily put 300 TB into 4U, even accounting for the 80% rule.
 
Top