Transfer speed dips to zero and returns back to normal - abnormal fluctuation speeds

hescominsoon

Patron
Joined
Jul 27, 2016
Messages
456
Dedupe requires tons of ram( it seems it's 5GB per TB) AND it also greatly increases disk i/o activity because the dedupe tables are on the disk as well...when you run out of ram for the tables to be store in ram..then it's write/read/write to add the data to the array AND the dedupe tables. Honestly if you have that much block level duplication...double your disk space or more. It depends on how much data you are storing..as you add more..your dedupe table grows. You will eventually exceed the ram capabilities of your machine.

If you want to go back to SMR that's your call...but it is HIGHLY reccommended you do not for obvous reasons. What i would consider is instead of a 2.5 chassis server build a new one with 3.5 drives. Then you can get truly high capacity CMR drives that will also run much much cooler.

Straight from ix systems site:

Setting up Deduplication without Adequate Planning


Deduplication is a much-desired feature for storage solutions. On any given system, more than half your data may be duplicates of data elsewhere in your storage pool, causing a greater storage consumption. Deduplication reduces capacity requirements significantly and improves performance by tracking duplicate data with a ‘deduplication table’, eliminating the need to write and store duplicate information. ZFS stores this table on disk, which means that, if the host has to refer to the on-disk tables regularly, performance will be substantially reduced because of the slower speeds of standard spinning disks.


This means you need to plan to fit your entire deduplication table in memory to avoid major performance and, potentially, data loss. This generally isn’t a problem when first setting up deduplication, but as the table grow over time, you may unexpectedly find its size exceeds memory. This splits the deduplication table between memory and hard disk, turning every write into multiple reads & writes, slowing your performance down to a crawl. In an enterprise environment, this can cause significant productivity decreases and angry staff workers. If this happens, the best solution is to add more system memory so that the pool will be able to import back to memory. Unfortunately, this can sometime take days to perform, and, if your hardware already has maxed out its memory capabilities, would require migrating the disks to a whole new system to access the data.


The general rule of thumb here is to have 5 GB of memory for every 1TB of deduplicated data. That said, there may be instances where more is required, but you will need to plan to meet the maximum potential memory requirements to avoid problems down the road. To get a more precise estimate of the required memory for deduplication do the following: run the ‘zdb -b (pool name)’ command for the desired pool to get an idea of the number of blocks required, then multiply the ‘bp count’ by 320 bytes to get your required memory. If it’s less than 5GB, still use the 5GB per terabyte of storage rule. If it’s higher, go with that number per terabyte.


For must use cases, it is recommended to just utilize lz4 compression for data consumption savings, as there’s no real processing cost. In fact, due to of the advances in CPU speeds, compression actually improves disk performance because writing uncompressed data to disk takes longer than compressed data. To be safe, always use compression instead of deduplication unless you know exactly what you are doing.
 

hescominsoon

Patron
Joined
Jul 27, 2016
Messages
456
With TNC12 they do have those special vdevs..i THINK you can use ssd's to hold the dedupe tables...that might be an option as well...but your standard off the shelf SSD's will get destroyed by the huge amount of writes the dedupe tables will present. you will need to get some very high endurance enterprise ssd's to handle that intense write load. Managing the dedupe tables still requires ram though..so upgrading to 128 gigs is probably still in your future...even with dedupe vdevs.
 
Last edited:

Love4Storage

Dabbler
Joined
Nov 6, 2020
Messages
35
My God! You are my savior! It all makes sense now why the sudden drop in performance and the other weird things that were happening. Will keep everyone updated on how I resolve this one. Hopefully it won't hurt my wallet too much!
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
The only difference being deduplication

That's a rather huge difference as outlined by @hescominsoon - you can identify the extent of the memory impact with zpool status -D poolname and look for the line that says "DDT entries X, size in core Y, size on disk Z"

Multiple the "number of entries" by the "size in core" and that's the RAM in bytes that you'll be using just to support your DDT.
 

Love4Storage

Dabbler
Joined
Nov 6, 2020
Messages
35
So I destroyed the pool and remade the dataset without dedupe. The size of the dataset was IDENTICAL to the original dataset, just with better performance. Dedupe is a SCAM! I have so many duplicates that I'm sure of and the size was identical.

I'm thinking about changing my 1.2TB SAS HDDs and going all flash. I went from a 4 bay to a 6 bay to a 12 bay chassis. And thinking about going back to a 4 bay. 2 NVMe drives, striped and a local 8TB CMR SATA drive that I can replicate to on a weekly basis.

This whole ordeal took a long time but I sure did learn a lot. And spent a lot. New chassis... New cables, new (identical SMR) drives, new RAM, new fans. Bjeezus. Keeping your data safe is not a cheap task for a home user.

One thing I learnt... I really don't like spinning rust.
 

hescominsoon

Patron
Joined
Jul 27, 2016
Messages
456
So I destroyed the pool and remade the dataset without dedupe. The size of the dataset was IDENTICAL to the original dataset, just with better performance. Dedupe is a SCAM! I have so many duplicates that I'm sure of and the size was identical.

I'm thinking about changing my 1.2TB SAS HDDs and going all flash. I went from a 4 bay to a 6 bay to a 12 bay chassis. And thinking about going back to a 4 bay. 2 NVMe drives, striped and a local 8TB CMR SATA drive that I can replicate to on a weekly basis.

This whole ordeal took a long time but I sure did learn a lot. And spent a lot. New chassis... New cables, new (identical SMR) drives, new RAM, new fans. Bjeezus. Keeping your data safe is not a cheap task for a home user.

One thing I learnt... I really don't like spinning rust.
I've not been quiet with my criticism of IX lately but this is one time I will come to their defense. Dedupe isn't a scam it's just not the panacea folks make it out to be. It's also(in ZFS land) EXCEPTIONALLY resource intensive. That's why compression is used instead..as it's a low resource high gain way to do things.

I love spinning rust..cost per GB cnnaot be beat right now. I ahve 8 of them in a striped series of mirrored vdevs. It keeps my machines wlel fed at a minimal cost. I jsut use 3.5 bays instead of 2.5 bays. ..:)
 

Love4Storage

Dabbler
Joined
Nov 6, 2020
Messages
35
Resource intensive is an understatement. I thought by following the rules of 5GB per TB the performance would be the same. This wasn't the case for me... it was more like 10~12GB to TB of the pool and the performance difference was staggering. How ARC competes with dedupe tables and how performance is unpredictable due to this feature's dependence on the storage media should be clearer. Anyways... a good learning experience.

I like spindles... it's just that they take up a lot of electricity, create heat, subject to vibration, make noise and take up space. Also, they have the possibility of failing. The only one thing going for them is that they cheap to purchase and easy to get new or second hand. With the money I spent on these drives, I could have gotten enough flash storage to replace this build. Time to sell the SMR drives!
 

hescominsoon

Patron
Joined
Jul 27, 2016
Messages
456
10k and above create heat..7.2 and lower..not so much. Think flash doesn't fail? You are in for a rude awakening. A good spinner will last longer than a good ssd under the smae workloads mowre times than not IME. flash has it's place(depending on workload and flash type of course)...
 

Love4Storage

Dabbler
Joined
Nov 6, 2020
Messages
35
Flash rarely fails. The life runs out and the drive becomes a read only brick. I've RMA'd maybe 100+ of spinners. I've never RMA'd either a SAS or SATA ssd. I'm not talking about consumer grade flash SSDs.

A good spinner will outlast an SSD, but how long is that SSD expected to last before it gets replaced? 3 years? 5 years? Look at the SLC drives from 2012-2013... These drives are still around performing great under the same workloads meant for HDDs. Ofcourse the capacity difference is there.

Point being. HDD's will have it's place, but SSD's are superior in so many ways. Once the cost of NAND reduces, the only place left for HDDs will be for archiving (ie where tape decks reign) and cold storage. Just my opinion, currently working in the semiconductor (RAM/FLASH) industry.
 

hescominsoon

Patron
Joined
Jul 27, 2016
Messages
456
not all flash fials to read only. quite a few of them jsut brick and your data is gone. I like my 6.5 year old SAS spinners i ahve right now in my r520 main box. I ahve some 10 year old sata drives in my 310 which is backup target for the 520. They just work. I do not have to worry about a 5 year lifespan as my drives here are 6+ yeasrs old in my 520 and pushing 8-10 years old in my r310...no errors..they keep spinning along...:)
 

Love4Storage

Dabbler
Joined
Nov 6, 2020
Messages
35
5 years is what they are warrantied for. Most drives last much much longer than that. MUUUCH longer. I still have 20+ SLC drives that were manufactured in 2012. Over 50% life remaining after heavy use. The HGST Ultrastars (MLC) are beasts when it comes to endurance. 5 years and still at 98% life.

Did we talk about noise, heat, space real-estate... electricity bill?
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Longposting here. Pardon the TL;DR that follows.

So I destroyed the pool and remade the dataset without dedupe. The size of the dataset was IDENTICAL to the original dataset, just with better performance. Dedupe is a SCAM! I have so many duplicates that I'm sure of and the size was identical.

Deduplication does work, but the hashes (and underlying data) have to match at the record level. This is obviously much harder with large files that use large records - a single bit being different in one of those 64K/128K chunks means the hashes don't match. There's also the issue of alignment - the same bits in a different spot in the record means "whoops, no more match" - even if the same bytes end up written at the ashift/physical disk, their misalignment in the ZFS record means once again - no hash match, no dedupe.

Much has been written about ZFS's deduplication failings; unfortunately it's not an easy problem to solve. I've seen it work absolute magic when the data is aligned and cloned.

Resource intensive is an understatement. I thought by following the rules of 5GB per TB the performance would be the same. This wasn't the case for me... it was more like 10~12GB to TB of the pool and the performance difference was staggering.

Deduplication working on a per-recordsize means that the "5GB RAM for 1TB of data" is only valid for the 64K recordsize. Smaller records equals more data, and if your files chopped down into 32K that means double the metadata/RAM requirement immediately.

How ARC competes with dedupe tables and how performance is unpredictable due to this feature's dependence on the storage media should be clearer. Anyways... a good learning experience.

Remind me to longpost on this later as a resource. Things have changed with the advent of TN12, new defaults, and the special vdevs being able to handle the writes, but it's still very much a situation where you need to be very certain that you'll get good results and then monitor to ensure that's true. Going in blind is a recipe for pain as you've discovered.

Several Users said:
Talkin' bout flash here

On this forum, most users are talking about consumer-grade SSDs - users on datacenter-grade SSD are a little thin in the ranks due to the added cost. You're right in that a good SSD will last for a long time, but they all eventually "burn out" through use, with varying amounts of use being necessary, and the usage pattern impacting it significantly. It's still a shift in thinking from the HDD mindset of "mechanical failure is what kills drives" not "regular usage" - although HDD vendors are making their "suggested annual usage" more visible, whether it's in writes or run-hours, so get ready for that as well. Not looking forward to the first vendor that tries to stick to the hardline on an RMA "oh, you used this drive for more than the 8x5 usage pattern so warranty void."

Someone dumping files to cheap DRAMless TLC or even QLC NAND might think it's just great because they're basically in a large-block archival pattern. Try to use the same drives for holding up a VMFS volume and they'll not only do a poor job performance-wise, but write amplification will eat them and their pitiful DWPD ratings alive.

But I'm preaching to the choir here likely:

Just my opinion, currently working in the semiconductor (RAM/FLASH) industry.

If you work for who I guess you do (and feel free not to confirm, if there's NDA/conflict of interest) then I quite enjoy the ability to write petabytes to your heavily-overprovisioned eMLC SAS SSDs before they politely inform me that they might fail in a few month's time, and if I wouldn't mind replacing them before they do something as uncouth as report a single bad bit.
 

Love4Storage

Dabbler
Joined
Nov 6, 2020
Messages
35
Wow... That's a lot to swallow.

BTW warranties have dual conditions. Duration and usage. For example... 5 years or XX TBW. Once you've written the crap out of a drive your warranty is void. Nice try though.

I agree... Consumer grade TLC and QLC is not suitable for all use cases. Home NAS use for storing files and backups is acceptable.

Also, eMLC is a dead NAND flash technology. Just as SLC has died, everything is moving to TLC in the enterprise. The technology has improved so much you can get comparible performance and endurance. And the Financials just make sense.

BTW.... I use USED flash. You can get great deals where I live, and sometimes in ebay, comparible to consumer grade. No enterprise flash is affordable for home users.
 
Last edited:

hescominsoon

Patron
Joined
Jul 27, 2016
Messages
456
5 years is what they are warrantied for. Most drives last much much longer than that. MUUUCH longer. I still have 20+ SLC drives that were manufactured in 2012. Over 50% life remaining after heavy use. The HGST Ultrastars (MLC) are beasts when it comes to endurance. 5 years and still at 98% life.

Did we talk about noise, heat, space real-estate... electricity bill?
noise..the drivews are silent. Power? 5W-10W/drive is no big deal for me..the server idles at 100W anyway..realestate..again a non issue..since the server chassis is 2u i'm fine with the 3.5 form factor. My primary system..it's SSD all the way as winders on a HDD is a painful experience..:) Right now the server with all 8 drives online sits at 124W jsut about 24x7. It jumps up to about 150W during a scrub when all 8 disks are getting the snot beat out of them..flast wouldn't make that much of a difference to make up for the much much higher upfront costs of installing 6TB SSD's.
 

Love4Storage

Dabbler
Joined
Nov 6, 2020
Messages
35
You are right. There's a place for platter and a place for NAND. Going all NVMe for cold storage is overkill but it's great. The DWPD is 1 so it's not used much more than storing and retrieving files and it always saturates the bandwidth. It was just that I had a few unused NVMe drives which I wanted to stripe and try out. Everything is replicated to another Freenas.

BTW you can get NVMe (4tb) for $300 on ebay. Ones with still warranty and good specs. The cheapest I found was $200. The prices are plumetting, especially the read intensive U.2 drives.
 

hescominsoon

Patron
Joined
Jul 27, 2016
Messages
456
You are right. There's a place for platter and a place for NAND. Going all NVMe for cold storage is overkill but it's great. The DWPD is 1 so it's not used much more than storing and retrieving files and it always saturates the bandwidth. It was just that I had a few unused NVMe drives which I wanted to stripe and try out. Everything is replicated to another Freenas.

BTW you can get NVMe (4tb) for $300 on ebay. Ones with still warranty and good specs. The cheapest I found was $200. The prices are plumetting, especially the read intensive U.2 drives.
My server isn't nvme capable...however 6 TB sas ssd's would be nice..or 2.5's(i cna use adaptors)..<G>
 

Love4Storage

Dabbler
Joined
Nov 6, 2020
Messages
35
6TB SAS SSD's are really expensive. You can get NVMe U.2 drives for cheap but sellers are rare. 6.4TB SAS drives are even rarer as they have more use and sell much faster.

Think of all the 2U 24 bay servers on the market. All these need to be filled up or need spares... thus the high price for sas drives. If I come across any 3.84 or 6.4TB sas drives for cheap I will PM you!
 

hescominsoon

Patron
Joined
Jul 27, 2016
Messages
456
6TB SAS SSD's are really expensive. You can get NVMe U.2 drives for cheap but sellers are rare. 6.4TB SAS drives are even rarer as they have more use and sell much faster.

Think of all the 2U 24 bay servers on the market. All these need to be filled up or need spares... thus the high price for sas drives. If I come across any 3.84 or 6.4TB sas drives for cheap I will PM you!
rofl..yeah i figured..that's why i went with spinners..:)
 
Top