Hardware config and iSCSI with Windows

jcl123

Dabbler
Joined
Jul 6, 2019
Messages
23
If you have the Server 2019 licenses already it can certainly help tilt things that way. Since your workload is mostly read as well we can try to violate a few of the other "best practices" in regards to block storage as well.

It's a good point, I often forget that the pricing for Server 2019 is a bit prohibitive.
Although you can run them in eval mode unrestricted for a very long time, years.
Note that, I am not looking to violate best practices if I don't have to.

No problem. Given the specs, I have my suspicions that a "certain brand of storage" was recently decommissioned at your office, and you were permitted to benefit from that once the drives were erased. ;)

Yeah, actually I was the one who had to decommission it. We were just going to destroy all the disks, but my boss said I could keep some as long as I wipe them. I had to change the sector format and long-format them anyway so easy enough.

Quick ZFS phrasing lesson - the "pool" is the top level aggregate, which in turn is made up of vdevs. Think of "vdevs" as the "5" in a RAID50 arrangement, and the "pool" that mashes them together being the "0" piece of it. More details on that front later.

Right, sorry, still don't have all the terminology down. VDEV is a disk and "pool" is an array of disks, right? Not sure I follow your RAID50 explanation.

This would be a good solution; a RAIDZ2 pool with a ZVOL (think "a piece of your ZFS pool") provisioned over iSCSI as either an RDM through VMware, or direct in-guest iSCSI (although the latter might complicate the network layout) and leave it async. The main OS drive would be on your (sync) VMFS.

Well, I really hate using RDM's, and I doubt FreeNAS supports vVols.
I am much happier going to the guest, I have extensive networking experience but I don't even think this is that hard.

I won't discourage the use of dedup in Windows either. The current state of dedup on ZFS is basically "don't" - there's some work in progress to correct this but as of now it's inline and extremely memory heavy. The upcoming special vdev type that will be there in TrueNAS 12 helps mitigate this but it doesn't change the core methodology of how it works.

Yeah, even high-end EMC SAN's don't do real-time dedup very well, and they are almost cost unlimited.
The only thing I have seen do it real-time "well" would be HPe Simplivity, but it is doing it with custom hardware.
There is no magic to the Windows implementation, it's just not real-time / idle time.

Sync can be set at the pool, dataset, or individual ZVOL level; so you can even have different levels of data assurance and power-loss-protection within the same system.

OK, good to know.


You'll likely need a second controller anyways, since the proper way to do FreeNAS as a VM involves PCI passthrough of the HBA entirely to the VM, and you need a storage device visible to VMware in order to store and boot the FreeNAS VMX.

Yeah, I am getting pretty frustrated with this. All the LSI one's are really expensive, especially with the flash cache.
I am actually currently looking at the Adaptec stuff, which is pretty nice and there are some good deals.
I need something somewhat new to support ESXi 7.x
I suppose I could put the on-board LSI back into IR mode since I only need RAID 1. Then use the extra LSI 9400 HBA for FreeNAS.

Question; how many of the different drives (SSD and 4T SAS) have you got? That can impact your pool decisions.

Here are the details:

I have 5-6 of these:
400Gig 2.5" U.2 NVMe SSE Seagate Nytro ST400KN0001 MK000400KWDUK
3 DWPD, PLP, 2400MB/s Read, 500MB/s Write, 55K -180K IOPS

I have around 18 of these:
200Gig 2.5" 6Gig SAS Samsung SM1625 MZ6SR200HMFU-000C3 MZ-6SR2000/OC3 (I think these are SLC)
10 DWPD, PLP, 500MB/s Read/Write
https://www.servethehome.com/samsung-sm1625-200gb-sas-ssd-quick-benchmarks/

I have around 40 of these:
400Gig 2.5" 6Gig* SAS WD/HGST HUSMM1640ASS204 0B32172
10 DWPD/5 years, PLP, 500MB/s Read/Write
(*12Gig SAS but firmware locked to 6Gig)

I have over 60 of these:
4TB 3.5" 7200RPM SAS Seagate ST4000NM0023

I have 7 of these:
10TB 3.5" WD100EMAZ6Gig SATA
These are WD RED NAS drives that I "shucked" from external USB drives

Now, before you get all kinds of ideas, I grabbed way too many of these, not sure what I was thinking.
The case I have can hold 16x 3.5" (single expander backplane) + a fair number of 2.5" drives.
I don't even really want to fully populate this thing if I don't have to, just because of power, noise, and heat.
Technically I also have the JBOD which can hold another 10, but I don't want to hit 26 drives.
Actually I was thinking the JBOD might be good for a cold backup copy.
I have dummy drive blanks so I could leave some slots unpopulated.

And for a guess:

Your 200G drives are Samsung SM1625's
Your 400G drives are Seagate 1200's
Your 4T drives are Seagate Constellation ES (1st gen)
And your NVMe drives are P3608 (but I'm unsure)

Let me know how many I get. ;)

You definitely got the 1st one, maybe not the 2nd one, not sure about the 4T's.
Actually I was wrong about the NVMe's, I thought they were Intel.

-JCL
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
It's a good point, I often forget that the pricing for Server 2019 is a bit prohibitive.
Although you can run them in eval mode unrestricted for a very long time, years.
Note that, I am not looking to violate best practices if I don't have to.
For direct file storage the "best practice" would be to store it directly on your array. The challenge with putting them on a Windows system consuming an iSCSI ZVOL instead is adding the layer of abstraction that is NTFS - ZFS no longer has the visibility into the files themselves, they're just a chunk of blocks. But as mentioned, if you're doing fancy things with Windows ACLs (or just really prefer to feed Windows to Windows) then it will work - just possibly with a little more space used and a little less performance overall.

Yeah, actually I was the one who had to decommission it. We were just going to destroy all the disks, but my boss said I could keep some as long as I wipe them. I had to change the sector format and long-format them anyway so easy enough.
Glad to hear you saved a bunch from the shredder.

Right, sorry, still don't have all the terminology down. VDEV is a disk and "pool" is an array of disks, right? Not sure I follow your RAID50 explanation.

No worries. A group of disks joins together to make a vdev - this is where your redundancy level comes in, eg: mirrors, RAIDZ1, Z2, etc. A group of vdevs join together (in a "JBOD" style arrangement) to make a pool. Different types of vdevs tolerate a different number of disks lost, but losing an entire vdev means the whole pool is offline.

ZVOLs can be thought of as "partitions" - block devices that consume space from the pool.

Well, I really hate using RDM's, and I doubt FreeNAS supports vVols.
I am much happier going to the guest, I have extensive networking experience but I don't even think this is that hard.
No support for vVols I don't even think on commercial TrueNAS. You can go for in-guest iSCSI mounting, you'll just want to use initiator groups either by IP or IQN to make sure that your Windows guest doesn't try to mount your VMware LUNs, and vice versa.

Yeah, even high-end EMC SAN's don't do real-time dedup very well, and they are almost cost unlimited.
The only thing I have seen do it real-time "well" would be HPe Simplivity, but it is doing it with custom hardware.
There is no magic to the Windows implementation, it's just not real-time / idle time.
"Given sufficient thrust, pigs fly just fine" - if you're willing to spend huge sums of money, you can get inline deduplication with high performance. But deduplication after the fact is way lighter workload, which is why cheaper/weaker gear can pull it off in backup appliances.

Yeah, I am getting pretty frustrated with this. All the LSI one's are really expensive, especially with the flash cache.
I am actually currently looking at the Adaptec stuff, which is pretty nice and there are some good deals.
I need something somewhat new to support ESXi 7.x
I suppose I could put the on-board LSI back into IR mode since I only need RAID 1. Then use the extra LSI 9400 HBA for FreeNAS.
No need for an expensive flash-backed one if you're just going to boot vSphere and load the FreeNAS VMX; if it's supported under 7.x I'd go with your last option (switch onboard back to IR and boot from that) especially if you intend to use FreeNAS to provide the VMFS datastores via loopback iSCSI/NFS. You do have that second (or third?) 32GB system as well; that would likely be fine if you were going to use it for the iSCSI file-store, but wouldn't do justice to the rest of that hardware.

And speaking of hardware, let's have a look at all that awesomeness you salvaged.

I have 5-6 of these:
400Gig 2.5" U.2 NVMe SSE Seagate Nytro ST400KN0001 MK000400KWDUK
3 DWPD, PLP, 2400MB/s Read, 500MB/s Write, 55K -180K IOPS

Damned fine devices, although the U.2 form factor is harder to pack densely into a system. I believe IcyDock makes a 4-into-5.25" bay adapter but you'd need the NVMe HBA anyways; not cheap.

I have around 18 of these:
200Gig 2.5" 6Gig SAS Samsung SM1625 MZ6SR200HMFU-000C3 MZ-6SR2000/OC3 (I think these are SLC)
10 DWPD, PLP, 500MB/s Read/Write
https://www.servethehome.com/samsung-sm1625-200gb-sas-ssd-quick-benchmarks/
Vanilla SM1625s are eMLC - these might be SLC though due to the OEM source. Good for L2ARC, possibly too dated for SLOG, maybe too small for regular data use, especially given that:

I have around 40 of these:
400Gig 2.5" 6Gig* SAS WD/HGST HUSMM1640ASS204 0B32172
10 DWPD/5 years, PLP, 500MB/s Read/Write
(*12Gig SAS but firmware locked to 6Gig)
Most people would be happy to have a couple for SLOG devices - you can build entire pools from these stupidly well overprovisioned (576G of raw NAND inside these I believe) SSDs and have them soak up writes for petabytes.

I have over 60 of these:
4TB 3.5" 7200RPM SAS Seagate ST4000NM0023
Constellation ES.3's, quality SAS spindles, I've got some of the 0043 self-encrypting ones myself. Easy to slap together into RAIDZ2 and store capacity.

I have 7 of these:
10TB 3.5" WD100EMAZ6Gig SATA
These are WD RED NAS drives that I "shucked" from external USB drives
https://www.servethehome.com/wd-wd100emaz-easystore-10tb-external-backup-drive-review/
Based on total bay count, these will probably be your option for maximum capacity.

Now, before you get all kinds of ideas, I grabbed way too many of these, not sure what I was thinking.
Probably dreams of having an all-flash array for your bulk file storage, "because science isn't about 'why' it's about 'why not?'"

But if you're having second thoughts, I'll be more than happy to take any of those extra, unnecessary SSDs off your hands. ;)

The case I have can hold 16x 3.5" (single expander backplane) + a fair number of 2.5" drives.
I don't even really want to fully populate this thing if I don't have to, just because of power, noise, and heat.
Technically I also have the JBOD which can hold another 10, but I don't want to hit 26 drives.
Actually I was thinking the JBOD might be good for a cold backup copy.
I have dummy drive blanks so I could leave some slots unpopulated.
Ultimately let's look at this in terms of solving your needs.

The challenge here is that your data currently resides in Windows format on those 7x 10T drives; and those would be the best candidates for a 7-drive RAIDZ2 yielding about 50T of usable space. Failing that, if you put 10x4T into the JBOD as a single RAIDZ2 vdev, that would give you ~32T usable out of the gate. You could then use that to transfer all of your data off the 7x10T drives, then format them and load them into 7 out of the 16 main bays in your case, copy/zfs send the data over to them, and clear out your JBOD. It can then be a cold/backup copy via replication. Set up a pair of the HGST SSDs for mirrored SLOG and you can turn on sync writes to protect in-flight writes without wreaking havoc on your latency; you can even keep three more ready for when TrueNAS 12 drops and assign them as metadata-only devices in a three-way mirror for extra safety.

If you're going to serve VMs, the path forward with that much SSD is obvious - a big pool made up of mirrored HGST HUSMM drives. It should be fast even with sync writes. But that depends on how much space you need for the performance pool, or if you need that performance at all. Mirrors all the way for the VMs though.

You definitely got the 1st one, maybe not the 2nd one, not sure about the 4T's.
Actually I was wrong about the NVMe's, I thought they were Intel.
Bah, you said "Seagate 400G SAS" in your initial post, "HGST 400G SAS" would have been a dead giveaway for the HUSMM-series.
 

jcl123

Dabbler
Joined
Jul 6, 2019
Messages
23
BTW, I just want to make sure it is said, I really appreciate the info, this is a very informative post.

For direct file storage the "best practice" would be to store it directly on your array. The challenge with putting them on a Windows system consuming an iSCSI ZVOL instead is adding the layer of abstraction that is NTFS - ZFS no longer has the visibility into the files themselves, they're just a chunk of blocks. But as mentioned, if you're doing fancy things with Windows ACLs (or just really prefer to feed Windows to Windows) then it will work - just possibly with a little more space used and a little less performance overall.

You know, I think this here is getting to the brass tacks of my original question and it is a key consideration.
If I do it this way, am I basically negating much of the benefits of ZFS by not giving it access to the file data?
Not sure if NFS would get around this problem, some trade-offs but maybe.

This more than anything else would sway me to just living with Samba and moving on.
I feel lucky to find someone with the depth of knowledge to explore this point.

Glad to hear you saved a bunch from the shredder.
Well, you can see I was pretty selective. There were a whole lot of drives (100's) <4TB and most of them SATA.
We have one of those hard disk crusher things, they are fun for the first 50-60 drives, but then it quickly becomes work especially because you have to take them out of the drive carriers first. There are still two decent sized dumpsters full of drives that me and other people have been chipping away at over months. If I had the energy I would harvest the neodymium magnets, my kids would love it but it is a bear to get them out.

No worries. A group of disks joins together to make a vdev - this is where your redundancy level comes in, eg: mirrors, RAIDZ1, Z2, etc. A group of vdevs join together (in a "JBOD" style arrangement) to make a pool. Different types of vdevs tolerate a different number of disks lost, but losing an entire vdev means the whole pool is offline.

ZVOLs can be thought of as "partitions" - block devices that consume space from the pool.
I will get it down eventually, what happens is that we have many different products at work that all use different terminology for the same thing, although ZFS has a bunch of different stuff going on.

No support for vVols I don't even think on commercial TrueNAS. You can go for in-guest iSCSI mounting, you'll just want to use initiator groups either by IP or IQN to make sure that your Windows guest doesn't try to mount your VMware LUNs, and vice versa.
Yeah, they probably never will, because I think VMware charges a fee.
If FreeNAS can do it, I could separate the traffic on different port groups or even separate vSwitches, and never the twain shall meet.

No need for an expensive flash-backed one if you're just going to boot vSphere and load the FreeNAS VMX; if it's supported under 7.x I'd go with your last option (switch onboard back to IR and boot from that) especially if you intend to use FreeNAS to provide the VMFS datastores via loopback iSCSI/NFS. You do have that second (or third?) 32GB system as well; that would likely be fine if you were going to use it for the iSCSI file-store, but wouldn't do justice to the rest of that hardware.

Well, I get into this analysis paralysis: find something for a good price, that is not counterfeit, and I tend to go with things that will last me a little while, so maybe not have flash-backed, but with the capability to add it. And it needs to be new enough to support ESXi 7.x. That last one seems to be the kicker, otherwise I could use previous-gen 6Gig parts. Currently looking at this one here: LSI Logic MegaRAID SAS 9341-8i Looks like you don't even need the cache if you are all-flash anyways, and as you say just RAID 1 or 10.

Yeah, the 3rd system is old but decent, Xeon E3-1280 on a good old X9SCM-F. Definitely a good fit if I don't bother with iSCSI and just go bare metal. I may be doing that anyways just for backup otherwise just make it a pFsense box.

I ended up with the first two that are Supermicro X10srh-cf because I got an amazing price, and I had the right CPUs and RAM needed in the junk pile. I am actually thinking of keeping the 2nd one in case the first one dies or something, I have a spare.

On the other hand, if I *did* go bare metal with one of these, they have 10x SATA3 ports in addition to the SAS, which would work fine with the 10TB SATA disks.

And speaking of hardware, let's have a look at all that awesomeness you salvaged.
I have to say it is helpful to get this assessment of this hardware from someone who is familiar with these drives.

Damned fine devices, although the U.2 form factor is harder to pack densely into a system. I believe IcyDock makes a 4-into-5.25" bay adapter but you'd need the NVMe HBA anyways; not cheap.
This is how I am doing it:
IMG_20200330_205613.jpg

Those SM 2x M.2's are cheap, and the cables aren't too bad either.
The card behind it is a regular single NVMe with a Samsung 512Gig drive on it.
I can plug them into the LSI 9400 but the cables are actually expensive and it burns allot of ports.
Starting to get crowded on PCIe slots here though, because you have to turn on bifurcation to use this card, and that setting changes it for each pair of slots. I may end up just using them in my desktop or something, I dunno.

Vanilla SM1625s are eMLC - these might be SLC though due to the OEM source. Good for L2ARC, possibly too dated for SLOG, maybe too small for regular data use, especially given that:
Yeah, too small but the high endurance should make for great cache devices. How does the fact that these are older affect the SLOG?

Most people would be happy to have a couple for SLOG devices - you can build entire pools from these stupidly well overprovisioned (576G of raw NAND inside these I believe) SSDs and have them soak up writes for petabytes.
Yeah, these are arguably the real gem of the salvage operation.
Especially since they are just big enough to start adding up, and the amount of VM storage I need is small, even 1TB would probably be plenty. They have been in use for awhile, not sure if there is an easy way to tell how much endurance they have left. The SAN spread the load across so many they are probably fine.

Constellation ES.3's, quality SAS spindles, I've got some of the 0043 self-encrypting ones myself. Easy to slap together into RAIDZ2 and store capacity.
That is good to hear, I wasn't sure if these were junk or not.

Probably dreams of having an all-flash array for your bulk file storage, "because science isn't about 'why' it's about 'why not?'"
Hehe, I probably shouldn't tell you this, but the expander in the JBOD box is a nice Intel with 36x 12Gig ports, and the LSI 9400 is a 16i card. That many SSDs would likely saturate the uplink even if you did wide, but that is still 8x12Gig ports. Really the only thing stopping me from even trying it would be buying that many SAS cables. But it would be interesting. Although with that much cables my lab would look like Darth Vader's bathroom....

But if you're having second thoughts, I'll be more than happy to take any of those extra, unnecessary SSDs off your hands. ;)
Honestly I would give some of them away if I was allowed to, but they have serial numbers and stuff. But even if I thought I could get away with it a promise is a promise to my boss.

Ultimately let's look at this in terms of solving your needs.

The challenge here is that your data currently resides in Windows format on those 7x 10T drives; and those would be the best candidates for a 7-drive RAIDZ2 yielding about 50T of usable space. Failing that, if you put 10x4T into the JBOD as a single RAIDZ2 vdev, that would give you ~32T usable out of the gate. You could then use that to transfer all of your data off the 7x10T drives, then format them and load them into 7 out of the 16 main bays in your case, copy/zfs send the data over to them, and clear out your JBOD. It can then be a cold/backup copy via replication. Set up a pair of the HGST SSDs for mirrored SLOG and you can turn on sync writes to protect in-flight writes without wreaking havoc on your latency; you can even keep three more ready for when TrueNAS 12 drops and assign them as metadata-only devices in a three-way mirror for extra safety.
Yeah, it's not the end of the world if I effectively have to build a temporary setup so that I can shift the data off of those 10TB drives, because I do like your idea here, it might be the way to go. Just a bit time consuming.

You really like using those HGST drives for the SLOG, will it even use that much space?

If you're going to serve VMs, the path forward with that much SSD is obvious - a big pool made up of mirrored HGST HUSMM drives. It should be fast even with sync writes. But that depends on how much space you need for the performance pool, or if you need that performance at all. Mirrors all the way for the VMs though.

I think right now I am leaning toward dropping the iSCSI idea completely. Just get a good RAID card for VMFS to host all the VMs.
Then do the swap to get the 10TB drives for the data pool going with your suggestion. Keep it simple. By the way, there are 7x of those drives because that was the number needed for dual-parity under storage spaces. I could add one if 8x works better for FreeNAS. But otherwise 50TB is more than enough space to last me quite awhile.

Technically I could just map a persistent network drive from Windows to the FreeNAS, using SMB3 at 10Gig the performance will probably be OK. The latest versions of Samba in FreeNAS are approaching feature parity with Windows, and it is much more mature than that QNAP I had. I might even be able to use junction points and mount to folders.

I could always play with the iSCSI idea in a lab scenario not using real data. This is one of the reasons to have multiple servers around anyways.

Then I just use the other drives to create another server to be the backup. I am leaning toward just using something like Veeam because it is just a backup. Replication is fine but generally that is more for HA than backup, I know ZFS can do the snapshots and stuff but I am not that experienced with it yet.

Bah, you said "Seagate 400G SAS" in your initial post, "HGST 400G SAS" would have been a dead giveaway for the HUSMM-series.
Yeah, my bad, I was confusing them with the 4TB drives.....

Great conversation, I think I am homing in on a good solution here.

-JCL
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
BTW, I just want to make sure it is said, I really appreciate the info, this is a very informative post.

Happy to help, and to live vicariously through your glorious salvage operation.

You know, I think this here is getting to the brass tacks of my original question and it is a key consideration.
If I do it this way, am I basically negating much of the benefits of ZFS by not giving it access to the file data?
Not sure if NFS would get around this problem, some trade-offs but maybe.

This more than anything else would sway me to just living with Samba and moving on.
I feel lucky to find someone with the depth of knowledge to explore this point.

You're still gaining the benefits of your underlying disk being protected by all of ZFS's error-detection/correction routines; it's moreso that you're adding another layer of complexity. When the files are directly on ZFS, it can see them as individual files and protect/restore them at that granularity - eg, you decide to do rolling snapshots, accidentally overwrite a file; you can reach back into a previous ZFS snapshot and fish it out fairly quickly. If you're presenting a ZVOL to a Windows VM, you can still do the same snapshots but your restoration option is "mount a previous ZVOL snapshot as a new LUN, RDM it to the system, pick through and hope you've found the right file, restore it there" - not as clean.

Given that you're going to use Windows clients, I can't see that NFS would make things any easier. We do have some pretty solid SMB resources here - @anodos earned his custom title of the "Sambassador" for a reason - that can help you through any peculiarities. If you have a functional Active Directory setup, you could also join your FreeNAS system to AD and manage permissions/ACLs that way, which might alleviate some of your potential issues.

"What's the core problem you hope to solve by going with Windows for share management?" I suppose is the question.

Well, you can see I was pretty selective. There were a whole lot of drives (100's) <4TB and most of them SATA.
We have one of those hard disk crusher things, they are fun for the first 50-60 drives, but then it quickly becomes work especially because you have to take them out of the drive carriers first. There are still two decent sized dumpsters full of drives that me and other people have been chipping away at over months. If I had the energy I would harvest the neodymium magnets, my kids would love it but it is a bear to get them out.

You certainly made some good choices. Mind your fingers on the magnets though. ;)

I will get it down eventually, what happens is that we have many different products at work that all use different terminology for the same thing, although ZFS has a bunch of different stuff going on.

@Ericloewe wrote a very useful Intro to ZFS guide, with a good section on structure and terminology - worth a read, if you have some time later on.

Yeah, they probably never will, because I think VMware charges a fee.
If FreeNAS can do it, I could separate the traffic on different port groups or even separate vSwitches, and never the twain shall meet.

You can configure completely different login portals and iSCSI interfaces to split them; I was only suggesting you go as far as creating some initiator groups and pairing them to specific targets, in order to limit which IQNs can see which LUNs (eg: default group is your VMware hypervisors, make a secondary group for the Windows VM, and only present the filesystem ZVOL there) - or you can present to vSphere, pass the LUN through RDM, and then you have a fileserver that you can vMotion around for individual host downtime.

Well, I get into this analysis paralysis: find something for a good price, that is not counterfeit, and I tend to go with things that will last me a little while, so maybe not have flash-backed, but with the capability to add it. And it needs to be new enough to support ESXi 7.x. That last one seems to be the kicker, otherwise I could use previous-gen 6Gig parts. Currently looking at this one here: LSI Logic MegaRAID SAS 9341-8i Looks like you don't even need the cache if you are all-flash anyways, and as you say just RAID 1 or 10.

It's a question of "are you planning to put any local storage in your hosts" - if you are, then you might need a different controller for acceptable performance, but if not, and the AHCI controller is supported for boot, just run a tiny datastore off of them and back up the FreeNAS.vmx - even if it does get lost, a system backup from within FreeNAS will be more valuable and more than enough to restore things to where they were.

VMware dropping the older SAS2000-series parts was a bit of a kick in the pants for home users, but I (and several businesses) are following the rule of "never run the first release" and at least have until vSphere 7.0 U1 before needing to clear that particular hurdle.

Yeah, the 3rd system is old but decent, Xeon E3-1280 on a good old X9SCM-F. Definitely a good fit if I don't bother with iSCSI and just go bare metal. I may be doing that anyways just for backup otherwise just make it a pFsense box.

Would be a fine backup target; the lack of RAM mostly would hurt the random reads (which block storage does a lot of) but it would likely write as fast as your disks would handle it.

I ended up with the first two that are Supermicro X10srh-cf because I got an amazing price, and I had the right CPUs and RAM needed in the junk pile. I am actually thinking of keeping the 2nd one in case the first one dies or something, I have a spare.

On the other hand, if I *did* go bare metal with one of these, they have 10x SATA3 ports in addition to the SAS, which would work fine with the 10TB SATA disks.

Fire them both up and have fun with clustering, I'd say.

Those SM 2x M.2's are cheap, and the cables aren't too bad either.
The card behind it is a regular single NVMe with a Samsung 512Gig drive on it.
I can plug them into the LSI 9400 but the cables are actually expensive and it burns allot of ports.
Starting to get crowded on PCIe slots here though, because you have to turn on bifurcation to use this card, and that setting changes it for each pair of slots. I may end up just using them in my desktop or something, I dunno.

Probably a good use case for the Seagate NVMe, at least until you can find another 9400 in the "junk pile" :)

Yeah, too small but the high endurance should make for great cache devices. How does the fact that (the SM1625s) are older affect the SLOG?
They'll have lower performance than the HGSTs, and possibly more wear on them due to age. It's not that they're "bad" at it, you just have a better alternative in the HGST drives.

Yeah, these (HGST SSD1600MM) are arguably the real gem of the salvage operation.
Especially since they are just big enough to start adding up, and the amount of VM storage I need is small, even 1TB would probably be plenty. They have been in use for awhile, not sure if there is an easy way to tell how much endurance they have left. The SAN spread the load across so many they are probably fine.

This was a massive win for you. Pull the smartctl results and you should see under the "Error counter log" a column for "Gigabytes processed" - have a look at the value for Write. These might also have a "media wearout indicator" but that could be masked from SMART by the fact that they were OEM drives before. Dump a full log without the LUID or S/N if you're comfortable and it should be visible pretty easily.

That is good to hear, I wasn't sure if (the Constellation ES.3's) were junk or not.

Definitely good drives. Check their SMART results for "elements in grown defect list" - normally I'd say that 1 or 2 is acceptable given the age but you'd want to watch that it doesn't grow, but given the number you have you could be picky enough to only use ones with absolutely no errors present.

Hehe, I probably shouldn't tell you this, but the expander in the JBOD box is a nice Intel with 36x 12Gig ports, and the LSI 9400 is a 16i card. That many SSDs would likely saturate the uplink even if you did wide, but that is still 8x12Gig ports. Really the only thing stopping me from even trying it would be buying that many SAS cables. But it would be interesting. Although with that much cables my lab would look like Darth Vader's bathroom....

do_it.gif


Honestly I would give some of them away if I was allowed to, but they have serial numbers and stuff. But even if I thought I could get away with it a promise is a promise to my boss.

Fair enough, you absolutely don't want to sour a relationship with a boss and organisation that's letting you get that much quality gear for your own personal use.

Yeah, it's not the end of the world if I effectively have to build a temporary setup so that I can shift the data off of those 10TB drives, because I do like your idea here, it might be the way to go. Just a bit time consuming.

That's why I suggested using the JBOD for the "temporary space" - you can cleanly export the pool and unplug the JBOD to tidy up the temporary situation.

You really like using those HGST drives for the SLOG, will it even use that much space?

No, the default tunables will only have you using 4G of any sized drive you put into duty as SLOG. But the HGSTs are very good at it. Not as good as an Optane/other Intel DC-series NVMe card, or even the newest SAS SSDs - but they'll definitely help you with ingesting big writes. That's assuming you need an SLOG - if you're going to present LUNs over iSCSI then you do, if you end up with files directly on ZFS you probably don't.

I think right now I am leaning toward dropping the iSCSI idea completely. Just get a good RAID card for VMFS to host all the VMs.
Then do the swap to get the 10TB drives for the data pool going with your suggestion. Keep it simple. By the way, there are 7x of those drives because that was the number needed for dual-parity under storage spaces. I could add one if 8x works better for FreeNAS. But otherwise 50TB is more than enough space to last me quite awhile.

Any number of drives will work. There used to be some considerations around that certain numbers of drives were optimal but they're largely moot now with compression enabled. RAIDZ2 still isn't recommended for block storage due to performance behavior though; another reason why I'm suggesting we try to get to the root of the SMB woes.

Technically I could just map a persistent network drive from Windows to the FreeNAS, using SMB3 at 10Gig the performance will probably be OK. The latest versions of Samba in FreeNAS are approaching feature parity with Windows, and it is much more mature than that QNAP I had. I might even be able to use junction points and mount to folders.
Honestly, there are very few reasons I deploy Windows servers for SMB roles now - aside from "vendor insists on it" the big one is SMB 3.0 Continuous Availability - which you won't get on FreeNAS at all since it's not an HA solution.

I could always play with the iSCSI idea in a lab scenario not using real data. This is one of the reasons to have multiple servers around anyways.

Guilty as charged, I've got a few beside me humming (quietly) away, more in another room, and I'm always keeping an eye on equipment liquidation sites.

Then I just use the other drives to create another server to be the backup. I am leaning toward just using something like Veeam because it is just a backup. Replication is fine but generally that is more for HA than backup, I know ZFS can do the snapshots and stuff but I am not that experienced with it yet.

Snapshots are "points in time" but they aren't a substitute for full backups, especially since they by design live on the same set of drives as the main data. Another backup target, whether it's Veeam or ZFS replication, is never a bad design decision.

Great conversation, I think I am homing in on a good solution here.

Do make sure you share the final build with us all.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
@Ericloewe wrote a very useful Intro to ZFS guide, with a good section on structure and terminology - worth a read, if you have some time later on.
Always looking for feedback by the way. If you end up reading it, be sure to leave a comment or review or whatever so I know what works and what doesn't for new users.
 

jcl123

Dabbler
Joined
Jul 6, 2019
Messages
23
Always looking for feedback by the way. If you end up reading it, be sure to leave a comment or review or whatever so I know what works and what doesn't for new users.
I did read this guide, I found it helpful in some ways. I find it hard to comment on maybe until I have more experience with FreeNAS.

-JCL
 
Top