General Hardware and Terminology

Fastline

Patron
Joined
Jul 7, 2023
Messages
358
Hello,

I've been learning a lot in this forum in the past couple of months. It's great. I've got a few questions. It would be really helpful if someone could answer them.

1. Is it wise to use OEM Components such as HBA and the HDD/SSD/NVMe? Other than the warranty and price difference and availability, are OEMs more robust in general, if we talk about HBA and the Storage Drives or is it the same as retail units but with some warranty and price difference? Do the OEMs have more advantage over the retail units?

2. Is there really a speed difference when using 5400RPM and 7200RPM spindle disks?

3. Is there really a speed difference when using 6Gb/s SATA and 12Gb/s SAS?

4. What is Dedup and Deduplication?

5. I have heard about special vdevs. What are they exactly?

6. I have also heard about metadata. What are they exactly?

Thanks
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
1. Is it wise to use OEM Components such as HBA and the HDD/SSD/NVMe? Other than the warranty and price difference and availability, are OEMs more robust in general, if we talk about HBA and the Storage Drives or is it the same as retail units but with some warranty and price difference? Do the OEMs have more advantage over the retail units?
I think you need to define "OEM", or rather you need to be explicit about the items you're comparing. The expression "OEM" can be used for a bunch of things, and even if you narrow it down, it's hard to make absolute statements.
2. Is there really a speed difference when using 5400RPM and 7200RPM spindle disks?
Not in any practical way.
5. I have heard about special vdevs. What are they exactly?
Special vdevs are (fast, in any sane implementation) vdevs that are dedicated to metadata, small blocks or both, thereby offloading that burden from other, non-special vdevs.
4. What is Dedup and Deduplication?
Deduplication, often shortened dedup or dedupe, is the process of reducing a given set of data down to its unique blocks. E.g. if you have 100 VMs running the same OS, it's likely that much of the install is actually the same. ZFS allows for transparent dedup, theoretically addressing this scenario. In practical terms, ZFS' implementation is slow and painful and not suitable for general use.
6. I have also heard about metadata. What are they exactly?
Metadata, from meta, Greek for beyond, in addition to; and data, plural of datum, from Latin meaning "given" as in a given piece of information. Metadata is typically used specifically to refer to auxiliary data useful or necessary to make sense of the data of principal interest, for lack of a better term, and is a rather vast category that includes everything from filenames and creation times to information on the structure of a given piece of data and how it should be interpreted.
In ZFS terms, metadata mostly refers to metadata used by ZFS itself to support its features, rather than data directly accessible to users, even if it would be termed metadata in that user's context (though there is overlap between these two categories).
 

Fastline

Patron
Joined
Jul 7, 2023
Messages
358
I think you need to define "OEM", or rather you need to be explicit about the items you're comparing. The expression "OEM" can be used for a bunch of things, and even if you narrow it down, it's hard to make absolute statements.
I already mentioned the OEM thingy regarding HBA Cards and the White labelled OEM Drives.

Not in any practical way.
I thought so.

Special vdevs are (fast, in any sane implementation) vdevs that are dedicated to metadata, small blocks or both, thereby offloading that burden from other, non-special vdevs.
Over my head ;( Any simple way to explain?

Deduplication, often shortened dedup or dedupe, is the process of reducing a given set of data down to its unique blocks. E.g. if you have 100 VMs running the same OS, it's likely that much of the install is actually the same. ZFS allows for transparent dedup, theoretically addressing this scenario. In practical terms, ZFS' implementation is slow and painful and not suitable for general use.
Umm, and its implementation is recommended? What's the use case?

Metadata, from meta, Greek for beyond, in addition to; and data, plural of datum, from Latin meaning "given" as in a given piece of information. Metadata is typically used specifically to refer to auxiliary data useful or necessary to make sense of the data of principal interest, for lack of a better term, and is a rather vast category that includes everything from filenames and creation times to information on the structure of a given piece of data and how it should be interpreted.
In ZFS terms, metadata mostly refers to metadata used by ZFS itself to support its features, rather than data directly accessible to users, even if it would be termed metadata in that user's context (though there is overlap between these two categories).
Umm, is it recommended to set it up?
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
Let's start with the metadata again - metadata is all data that is not the contents of your files. The file names, the list of the file names in the directories, the modification time, etc.

A metadata special VDEV keeps metadata on a separate VDEV. So you can have a large pool of spinning disk drives for file sharing and in that pool one VDEV of SSDs for the metadata. That will speed up access to metadata by an order of magnitude.

Why could this be useful? If you have lots of users and directories with lots of files in them, a metadata VDEV significantly reduces the time it takes from opening a directory on a network share by double clicking in e.g. Windows Explorer to the full list of files being visible. Users generally enjoy that particular operation to be as "snappy" as possible.

For a single or two user home NAS you probably do not need to bother.

Deduplication saves space. Let's say you run 100 VMs all running Windows XYZ in your large VMware cluster and TrueNAS serves as the backing storage for the VM disk images. Probably many many blocks in all of these virtual "disks" will be identical across all or most of the VMs.
With deduplication you store only one copy instead of 100.

Problem: to make it work properly you need an insane amount of hardware resources, so you might just throw disk capacity at the problem and rather not use it at all. Which is currently recommended. Improved deduplication is on the roadmap for future ZFS releases.

See https://openzfs.org/wiki/OpenZFS_Developer_Summit_2023_Talks#Introducing_Fast_Dedup_(Allan_Jude)
 

Fastline

Patron
Joined
Jul 7, 2023
Messages
358
Let's start with the metadata again - metadata is all data that is not the contents of your files. The file names, the list of the file names in the directories, the modification time, etc.

A metadata special VDEV keeps metadata on a separate VDEV. So you can have a large pool of spinning disk drives for file sharing and in that pool one VDEV of SSDs for the metadata. That will speed up access to metadata by an order of magnitude.

Why could this be useful? If you have lots of users and directories with lots of files in them, a metadata VDEV significantly reduces the time it takes from opening a directory on a network share by double clicking in e.g. Windows Explorer to the full list of files being visible. Users generally enjoy that particular operation to be as "snappy" as possible.
OMG. Seems like @Patrick M. Hausen is the guy ;)

The second paragraph and i knew what it was being talked about. Explained well. Now i know why it took time. I'm getting there....

For a single or two user home NAS you probably do not need to bother.
I understand but any criteria to know that? I mean X number of directories or X number of files which would indicate that the user needs to implement metadata vdev. Also, what kind of drive is ideal for metadata? Is it important to have features such as power loss protection, low latency, high IOPS like the DC drives or a consumer drive would be good enough. I guess for the fast calculations, SSD is the go as you mentioned above.

Also, how to determine what capacity do i need for the metadata drive? Does it has to be redundant? I guess it has to be or if the special vdev is lost, the pool is dead.

Secondly, the special vdev for metadata is more recommended on a HDD setup or one can notice the significant improvement on the SSD setup as well?

Deduplication saves space. Let's say you run 100 VMs all running Windows XYZ in your large VMware cluster and TrueNAS serves as the backing storage for the VM disk images. Probably many many blocks in all of these virtual "disks" will be identical across all or most of the VMs.
With deduplication you store only one copy instead of 100.
Oh shit. I'm learning ZFS secrets :)

Problem: to make it work properly you need an insane amount of hardware resources, so you might just throw disk capacity at the problem and rather not use it at all. Which is currently recommended. Improved deduplication is on the roadmap for future ZFS releases.
Umm, does that mean it cannot be done on a normal server grade components? And what kind of hardware resources are we talking about?
 
Last edited:

ChrisRJ

Wizard
Joined
Oct 23, 2020
Messages
1,919
Umm, does that mean it cannot be done on a normal server grade components? And what kind of hardware resources are we talking about?
You will likely need 256-512 GB of RAM as a minimum and IIRC it also strains the CPU considerably. The bottom line, as @Patrick M. Hausen mentioned, is that for 99,5% of home lab users it does not make sense. And with VMs as the use-case that is usually brought up, how much space do you really need? Per VM I would guess it something in the range of 50-100 GB on average. That sums up to 5-10 TB for 100 VMs. It will usually be easier, cheaper, and faster to just add that amount storage.

I recently spoke with an IT enterprise architect who mentioned that his company runs about 25k VMs in their own data center alone (plus a few thousand on AWS and Azure). I could imagine that things might look different here. But then we also talk millions in budget per year.
 

Fastline

Patron
Joined
Jul 7, 2023
Messages
358
You will likely need 256-512 GB of RAM as a minimum and IIRC it also strains the CPU considerably. The bottom line, as @Patrick M. Hausen mentioned, is that for 99,5% of home lab users it does not make sense. And with VMs as the use-case that is usually brought up, how much space do you really need? Per VM I would guess it something in the range of 50-100 GB on average. That sums up to 5-10 TB for 100 VMs. It will usually be easier, cheaper, and faster to just add that amount storage.

I recently spoke with an IT enterprise architect who mentioned that his company runs about 25k VMs in their own data center alone (plus a few thousand on AWS and Azure). I could imagine that things might look different here. But then we also talk millions in budget per year.
Thanks for clearing it up!
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
...
3. Is there really a speed difference when using 6Gb/s SATA and 12Gb/s SAS?
...
Since no one responded to this item, yes 12Gbit/ps SAS is faster, even with SATA disks, than 6Gbit/ps SATA, under CERTAIN conditions.

But to sum it up, the average user won't see any speed difference.


Some certain conditions are when using a SAS Expander that supports 12Gbit/ps speed and using a HBA with 12Gbit/ps support. Any SATA disk plugged into the SAS Expander will use 6Gbit/ps speed. But, from the SAS Expander to the HBA, it will use the 12Gbit/ps speed.

So, if you have 8 SAS lanes of 12Gbit/ps from a HBA, wired to a 12Gbit/ps SAS Expander but more than 8 disks, their is no speed drop if all are accessed at once. Well, until you exceed the throughput of the disks and the amount of disks.

Some system board SATA controllers have limited communications to the CPU or chipset. So if all drive side ports on the SATA controller are in use, it is possible they the SATA controller can't send or receive the data fast enough.

Last, SAS is a full-duplex protocol for data. Meaning data can be sent and received at the same time for a SAS HDD or SSD. While SATA is half-duplex, so only sending or receiving can be done on a SATA HDD or SSD.
 
Last edited:
Joined
Oct 22, 2019
Messages
3,641
Metadata, from meta, Greek for beyond, in addition to; and data, plural of datum, from Latin meaning "given" as in a given piece of information.
Your answers were great... until this part. You explained beyond a simple answer and went meta with your description of metadata-... wait a second... :eek:
 

Fastline

Patron
Joined
Jul 7, 2023
Messages
358
Since no one responded to this item, yes 12Gbit/ps SAS is faster, even with SATA disks, than 6Gbit/ps SATA, under CERTAIN conditions.

But to sum it up, the average user won't see any speed difference.


Some certain conditions are when using a SAS Expander that supports 12Gbit/ps speed and using a HBA with 12Gbit/ps support. Any SATA disk plugged into the SAS Expander will use 6Gbit/ps speed. But, from the SAS Expander to the HBA, it will use the 12Gbit/ps speed.

So, if you have 8 SAS lanes of 12Gbit/ps from a HBA, wired to a 12Gbit/ps SAS Expander but more than 8 disks, their is no speed drop if all are accessed at once. Well, until you exceed the throughput of the disks and the amount of disks.

Some system board SATA controllers have limited communications to the CPU or chipset. So if all drive side ports on the SATA controller are in use, it is possible they the SATA controller can't send or receive the data fast enough.

Last, SAS is a full-duplex protocol for data. Meaning data can be send and received at the same time for a SAS HDD or SSD. While SATA is half-duplex, so only sending or receiving can be done on a SATA HDD or SSD.
Thank you for taking out the time to explain it.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Your answers were great... until this part. You explained beyond a simple answer and went meta with your description of metadata-... wait a second... :eek:
Well, I firmly believe in examining our languages whenever the opportunity arises, as it not only confers a greater understanding of why certain words came to be used as they are, both in a strict etymological sense and in a broader analysis of the use of language; it also provides the occasional fun fact. For instance, did you know that the word "barge", through a circuitous history, derives from ancient Egyptian
1698360514311.png
(https://en.wiktionary.org/wiki/bꜣjr#Egyptian - apologies, I can't seem to coerce XenForo into accepting Unicode hieroglyphs, so I had to use an image instead)?
 
Last edited:

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
Not in any practical way.
Well, lower RPM drives consume less electricity which means they are generally cooler/require less cooling, also usually being quieter than higher RPM ones.

Maybe not a big difference for large systems, but for SOHO it's still of relevance imho.

EDIT: didn't see the SPEED word in the OP post.
 

ChrisRJ

Wizard
Joined
Oct 23, 2020
Messages
1,919
Well, lower RPM drives consume less electricity which means they are generally cooler/require less cooling, also usually being quieter than higher RPM ones.
On the other hand there are very few, if any, NAS drives on the market other than those with 7200 RPM as far as I know. WD markets their Red model as 5400-class. But on numerous occasions I have come across writings where people said that in terms of power consumption as well as noise they are indistinguishable from 7200 RPM drives.

Of course that does not disqualify what @Davvo wrote.
 

asap2go

Patron
Joined
Jun 11, 2023
Messages
228
OMG. Seems like @Patrick M. Hausen is the guy ;)

The second paragraph and i knew what it was being talked about. Explained well. Now i know why it took time. I'm getting there....


I understand but any criteria to know that? I mean X number of directories or X number of files which would indicate that the user needs to implement metadata vdev. Also, what kind of drive is ideal for metadata? Is it important to have features such as power loss protection, low latency, high IOPS like the DC drives or a consumer drive would be good enough. I guess for the fast calculations, SSD is the go as you mentioned above.

Also, how to determine what capacity do i need for the metadata drive? Does it has to be redundant? I guess it has to be or if the special vdev is lost, the pool is dead.

Secondly, the special vdev for metadata is more recommended on a HDD setup or one can notice the significant improvement on the SSD setup as well?


Oh shit. I'm learning ZFS secrets :)


Umm, does that mean it cannot be done on a normal server grade components? And what kind of hardware resources are we talking about?
A metadata special device can make sense if you have a lot of files in a directory, like several thousand or if you regularly search over your entire pool.
The other thing that it's useful for is storing small files.
Imagine you copy a 30GB movie to an HDD. Write speed will be consistent 200MB/s or something similar. If you copy a game folder with many small files then you might end up with speeds in the KB/s range for those.
In zfs you can tell your pool to store small files (e.g. everything below 128KB) onto the special device and only write the larger files onto the HDDs.
That takes the burden of the poor spinning rust and improves performance.

However. Caching metadata is something zfs always does in ARC (RAM cache). And any small files that you regularly use are also in ARC.
As long as you have enough RAM that is.

As for devices Optane are great. Like 5800x, 4800x, 905p, 900p.
But expensive and hard to find.
I just use a mirror of Solidigm P44 Pro which have rather good IOPs and work just fine.

Be aware however:
If your special vdev fails then you loose the pool.
Like if you "loose" all names in a phone book it becomes pretty useless with just numbers.
So make it at least a two way mirror.
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
@Fastline a look into the following resource might help.

Baseline is for SOHO you likely don't need anything other than the data VDEVs.

On the other hand there are very few, if any, NAS drives on the market other than those with 7200 RPM as far as I know. WD markets their Red model as 5400-class. But on numerous occasions I have come across writings where people said that in terms of power consumption as well as noise they are indistinguishable from 7200 RPM drives.

Of course that does not disqualify what @Davvo wrote.
Seagate also has a few that run at 5900.

It really make a difference in small cases... There is a thread of a user buying a TN Mini and having really bad thermals because of high RPM WD drives use.
 

Fastline

Patron
Joined
Jul 7, 2023
Messages
358
A metadata special device can make sense if you have a lot of files in a directory, like several thousand or if you regularly search over your entire pool.
The other thing that it's useful for is storing small files.
Imagine you copy a 30GB movie to an HDD. Write speed will be consistent 200MB/s or something similar. If you copy a game folder with many small files then you might end up with speeds in the KB/s range for those.
In zfs you can tell your pool to store small files (e.g. everything below 128KB) onto the special device and only write the larger files onto the HDDs.
That takes the burden of the poor spinning rust and improves performance.

However. Caching metadata is something zfs always does in ARC (RAM cache). And any small files that you regularly use are also in ARC.
As long as you have enough RAM that is.

As for devices Optane are great. Like 5800x, 4800x, 905p, 900p.
But expensive and hard to find.
I just use a mirror of Solidigm P44 Pro which have rather good IOPs and work just fine.

Be aware however:
If your special vdev fails then you loose the pool.
Like if you "loose" all names in a phone book it becomes pretty useless with just numbers.
So make it at least a two way mirror.
Thanks for clearing it up.

So, two way mirror requires 4 disks right?
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
So, two way mirror requires 4 disks right?
A two-way mirror is a mirrored VDEV composed of two drives;
A three-way mirror is a mirrored VDEV composed of three drives;
And so on...


Generally, a standard home user won't need a special VDEV, L2ARC, or a SLOG unless you are doing either some specific workloads or really big and performing systems (both cases that require a good understanding of ZFS, TN, and a bunch of other things).
Dedup is even more out of the question.

P.S.: If you want to learn you can find more informations in the resource section of the forum or in my signature.
 
Last edited:

asap2go

Patron
Joined
Jun 11, 2023
Messages
228
A two-way mirror is a mirrored VDEV composed of two drives;
A three-way mirror is a mirrored VDEV composed of three drives;
And so on...


Generally, a standard home user won't need a special VDEV, L2ARC, or a SLOG unless you are doing either some specific workloads or really big and performing systems (both cases that require a good understanding of ZFS, TN, and a bunch of other things).
Dedup is even more out of the question.

P.S.: If you want to learn you can find more informations in the resource section of the forum or in my signature.
Correct.
But also no one *needs* ZFS for home use.
Many just do it because we can.
I am looking to upgrade to 25GbE and my measly AM4 to an Epyc platform for example.
Do I need that? No. I can't even think of a usecase as excuse.
But who is going to stop me from running a full nvme raidz2 8 wide with dual 25GbE?
Or even go 100GbE with a cheap mikrotik switch?
Z.jpg
 
Joined
Oct 22, 2019
Messages
3,641
So, two way mirror requires 4 disks right?
This is why tech talk, sucks. It leaves no room for casual speak. o_O

Two mirrors require 4 drives. (mirror vdev + another mirror vdev)

Two-way mirrors are comprised of two drives.
 
Top