IOPS of WD-Red NAS-disks?

Sharethevibe

Dabbler
Joined
Aug 21, 2019
Messages
21
I am building a NAS using 8x 8TB disks and am getting different info on the IOPS that I can calculate with to estimate the speed of the system. Info varies between disks-IOPS being 75 or being 600… I’m puzzled... but probably other forum users can clear this up?

Disks:
  • 8TB WD-Red, NAS-type, 5400rpm, 256MB cache (2018 model, type ‘EMAZ’, equal to ‘EFAZ’)
  • 8 disks mounted: 2 striped VDevs of 4 disks in Z1 (2x (4x disks in Z1)
The FreeNAS forum gives a calculation for determining the IOPS of the single disk:
1/(avg seektime in msec+ avg latency in msec). This would give 1/(0,008+0,006) being approx 70 IOPS.

Yet, when retrieving the measured IOPS as given online in various tests, e.g. by Tomshardware.com it's:
- for read: 200-500 IOPS (queue-depth 1 to 32)
- for write: 500-600 IOPS
Source: https://www.tomshardware.com/reviews/wd-red-10tb-8tb-nas-hdd,5277-2.html.
(and this is for the 128MB cache version, where mine are 256MB).

Also, when I compare it with similar tests for 3TB WD-Red disks, these have IOPS of 100-150.

So, it seems to me that in the real-world situation, this 8TB WD NAS-disk is way outperforming it's earlier/smaller brothers, and the IOPS of 200-600 are valid for this disk-type?
(WD with this disk probably making good use of the high datadensity and of the large cache and of TLER (reducing error-seektime) and whatever tricks WD knows in diminishing the seek/latency-times etc.?).

And as the runs in my processes typically are mass/large (reads or writes of 200.000-500.000 files; files are 50MB or 10MB) I reckon that the IOPS-figures for the 'queue-depth of 32' are most valid for my processes?

NOTE: the HBA disk-controllercard is the IT-flashed Dell H310, so this I believe can deliver up to 600 queue-depth?


Can anybody shed some light on this?

(this IOPS performance will for a great deal determine the speed of the NAS, so the need of perhaps going to 3 VDevs, maintaining open pool-space, setting of max record-size).



Thanks in advance!
 

Fred4000

Cadet
Joined
Sep 1, 2019
Messages
4
It's a bit off topic, but are you sure you want to go with 2x4 Z1? This seems like a very risky configuration (remember: one drive fails in one VDev, all others *must* keep working during repair, or your complete pool will be gone, incl. the other VDevs). I would only consider this if your data is (very) cheap to recreate or restore - or data loss doesn't matter much.

If you are writing large enough files, IOPS shouldn't matter that much as for small files, that is...

... performance may also be dependent on how you read/write the data. Opening one file at a time, or some in parallel, or many in parallel. The latter is very challenging, obviously.

A quick boost would be to go with 7200 drives instead.
 

AlexGG

Contributor
Joined
Dec 13, 2018
Messages
171
all others *must* keep working during repair, or your complete pool will be gone, incl. the other VDevs

I don't think this is correct. In a 2x4 Z1 configuration, if a drive fails, only three drives remaining in its vdev are used in recovery. The other four drives are not used, and the pool will survive another single drive failure if it happens on the other vdev.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
I don't think this is correct. In a 2x4 Z1 configuration, if a drive fails, only three drives remaining in its vdev are used in recovery. The other four drives are not used, and the pool will survive another single drive failure if it happens on the other vdev.
It's important to look at the context of the full quote here:

remember: one drive fails in one VDev, all others *must* keep working during repair, or your complete pool will be gone, incl. the other VDevs
"All others" implies "all others in the vdev" as contrasted to "your complete pool" mentioned later.

For IOPS performance, mirrors are strongly recommended instead, but a far bigger concern of mine is the choice of disk:

8TB WD-Red, NAS-type, 5400rpm, 256MB cache (2018 model, type ‘EMAZ’, equal to ‘EFAZ’)
Those are very likely to be shingled drives (SMR) which have an extremely poor performance pathology under ZFS (and in RAID generally) due to the overheard of reshingling on overwriting a given LBA.

See this post from user @deafen - this impacts reads as well.

https://www.ixsystems.com/community/threads/example-of-smr-ugliness-even-on-reads.78325/
 

Fred4000

Cadet
Joined
Sep 1, 2019
Messages
4
Yes, I should have phrased that more clearly. While it reduces the number of drives that can potentially break your pool, you are still putting the complete thing on risk.

If one would create two pools, this would at least cut the losses to half (but it might be more tedious to manage / distribute access). But still, I think a 2 x (4xZ1) is hardly ever a good starting point.
 

Sharethevibe

Dabbler
Joined
Aug 21, 2019
Messages
21
Thanks all for the first input.

Re the redundancy):
I'm aware that using a Z1-VDev means that - after this 1 disk has cut out - that the rest of the disks in the same VDev have to stay intact during repair (in the other VDev also 1 can cut out). And as it are 8TB disks (fairly large) repairtime longer than usual etc. But it's all about balancing things. Using 2 pools (1 VDev of 4 disks in Z1) I think gives same redundancy. And anyway, I need to be able to use all data as if it were on 1 Dataset. Higher data-safety would obviously be with a Z2-VDev, but then off course even more 8TB-disks are required. Or I could put all 8 disks in 1 Z2-VDev, but my current estimation is that I'll be needing 2 VDevs for the performance (this is the topic of this thread as y'all understand..;-)

Up to now I figured that, as I will be having a full back-up station of this NAS too, that running Z1-VDevs with 1 : 3 redundancy vs data is a fair redundancy approach. (it's a fileshare NAS; few simultanuous users).

Re the disk-technology):
These indeed are not SMR. These are indeed helium-filled 8TB NAS-types (5400rpm / 256MB cache). So, a rather large cache.

Back on topic (;-):
I am wondering what the IOPS-performance of these disks is, i.e. the figure that I can work with in my estimates of the speed of the NAS.
On this forum I here and there saw a calculation (1/(avg seek + avg latency). But all the testing figures give totally other numbers. (please see my initial post).
Please check the link I gave to those test-charts (it's a.o. from Tomhardware.com and they don't publish rubbish there).

The reason why I want to know the IOPS-figure is because I cannot just do some tests to find out the system-speed, as empty system will give glorious results anyhow.

And my current understanding is (correct me if I'm wrong) that:
1) as for transfer-speeds (writing a large run of files or reading a run of files):

- read/write-speed in MB/s = system-IOPS X max recordsize
- where: system-IOPS = IOPS of the single-disk x number of VDevs
- and: system actually reads/writes blocks, but these are maxed, to the set maximum recordsize (that we set per Dataset)
- so: in a situation where all files are 50/50 10MB or 50MB, and max recordsize is set on 1MB (1028kB), with every IO-action 1MB is read/written.

Fred2000 mentioned that 'for large files the IOPS does not matter much'.

- It does not, IF per IO-action a whole file is read or written (where my understanding is, that in ZFS per IO-action, 1 block/record is handled).

2) as for reading (writing) large runs of tagdata (the say 100k part, showing general data of the file):

- my current understanding is that the speed is determined by the system-IOPS: tagdata being say 100kB, with each IO-action 1 file is handled.
- and system-IOPS again is IPS of single-disk x number of VDevs.

Notes:
- I'm aware (JGreco's input on the forum) that when having some nice free unused poolspace, that the system is able to 'string many blocks together' in 1 write or read and that the IOPS-number goes up with 500% (when having just 10% usage); cannot afford that with the data-amount I have here and frankly it's bit wild off course to use just 10-25%
- I have 32 or of necessary 48GB or RAM/ARC and 500GB of NVME-SSD/L2ARC (latter not mounted yet in FreeNAS). But the typical read-runs will not not use the cache: it'll be 99% reads that are not from a 'frequently used set'. So the disk-pool has to do it.
- I'm striving for approx. 500MB/s for read/write of the 10/50MB-files. And as for 2nd major process (tagdata handling): 500 files/sec read-write of the 100k tagdata.
(typical runs being say 200.000 files)(being fed from a fast workstation)(1 on 1 NIC-connection 10Gb).

Please let me have your input on the estimation of the system-speed.

Tx in advance!
 

Sharethevibe

Dabbler
Joined
Aug 21, 2019
Messages
21
Info on the use-case and workload:

This NAS is to house a large music-collection. Gathering a few million tracks... processing/filtering this down to approx. 1 million tracks end-collection. Approx. 50% in lossless format i.e. 50MB per file. Other half in high-quality MP3 being approx 10MB.

The most demanding tasks therefore are:
1) file-transfer:
Adding/writing large runs of new tracks (collections of say 200-500.000 tracks); or reading such runs (to save these on a fast unit outside the NAS)(or to move large runs from 1 place to another on the NAS-pool)(re-mapping).

2) tagdata usage:
Reading (writing) tagdata in large runs (200.000-500.000), i.e. the descriptive data of the musicfile, those being sections of the main file of say 100kB.
The tagdata is read and comparisons are done in order to find duplicates, and to rank lists of duplicate musicfiles etc. (in other processes also small bits of the soundfile are read, but especially during pure tagdata-reading/analyis, it's the disks that are the bottleneck; during a pre-test under Windows, using a 3-disk striped RAID, the two Xeons of the workstation were only used 20%, where they were handling the RAID + doing the tagdata reading/analysis).

The NAS has a fast 10Gb-connection, peer-to-peer with a fast workstation. (from there the processing on the collection is done). Workstation has dual Xeons e5-2670 (2x8 cores / 2x 16 threads), so can handle a lot of tasks simultaneously/per sec.

The above runs, doing transfers and tagdata-analysis, are recurrent: the build-up of the collection will take place over time (and in fact there will allways be additions).
A task of e.g. reading/analysing 300.000 tracks can take a while and this indeed during pre-test set-up could last say 1-2 days (24-48 hrs).
So the objective is to make this more manageable.

Further info on the writing:

I intend to switch the ZIL off (put sync writes on 'disable'). Do not foresee sync writes here (during processing there is 1 user/1 task running at the time; during 'internettime' there might be occasional writing of single user via 1Gb/internet NIC).

My understanding is that this means that all writes go to the RAM directly (a buffer of 5-10 seconds of writes). And from that to the diskpool. There is no ZIL so the speed of the data-write to disk determines the speed. If I understand correctly that series of writes on RAM goes onto disk in 1 write/1 operation. This being chunks of say 5000MB, this then means that the write-speed is then limited by the max bandwidth-performance of the combined disk-set; having 8 disks of which 2 diskvolumes for parity and a single disk doing avg of 160 MB/s, this means 6 x 160 MB/s = 960 MB/s.

Is this a correct estimation for the write-speed?

And how can the read-speed be estimated in this case?
 

Sharethevibe

Dabbler
Joined
Aug 21, 2019
Messages
21
Further info & questions on the max recordsize:

I read here https://www.reddit.com/r/zfs/comments/cm594b/why_is_nobody_talking_about_the_newly_introduced/
that the max recordsize that can be set apparantly is not 1MB but 16MB?
(just read the Dagger-input '28 days ago').

This NAS of mine holding mediafiles of (50%) 10MB-size and (50%) 50MB-size, I reckon that the r/w-speed of the system will improve a lot when not setting recordsize on 1MB but on multiple MB's? (with every IO-action it 'shoves more data to the disk')?
  • for the avg 10MB-files (ranging 5-15): ideal recordsize being 10MB?
  • for the avg 50MB-files (ranging 40-60): ideal recordsize being 50MB?
    • yet max = 16MB
    • yet (if in 1 Dataset), keep it on max 10, because of the 10MB-files?
    • and: max total disk-bandwidth (read) being 8 x 160 = 1280MB/s, so more then 10MB per record cannot be handled by disks
      • yet: if pool extended, bandwidth goes up?
    • so:
      • if in 1 dataset, set recordsize on 10MB?
      • if in separate datasets, set on 10MB for the 10MB-files and set on 16MB for the 50MB-files?
Appreciate your thoughts on this.

If this is correct it changes the estimation of the r/w-speed quite a lot off course.

Tx in advance!
 
Top