NVMe support?

jgreco · Oct 12, 2015

Something said:
Good point, enterprise will still want storage density, though i've been focusing on consumer and enterprise in a really flipflop manner.

An opportunistic discussion strategy, in other words! ;-) ;-)

cyberjock · Oct 12, 2015

Something said:
Admittedly that is with regards to consumer, not enterprise loads. For enterprise loads you wouldn't even want to consider the EVOs as they lack power loss protections.

Agreed, but 'non-enterprise' loads run async, so the purpose of the slog is gone. So clearly we're talking only about sync loads, which by the same definition means 'enterprise'. Sure, you can run enterprise workloads at home, but that doesn't make them 'non-enterprise' just because you don't use them in a large company. ;)

Something said:
Newer Sammy/Sandisk drives will chomp less than a laptop drive. <5w a piece. Efficiency given their speed also makes them massively advantageous from a power usage perspective. Slap an SSD into an old laptop, you'll actually notice a battery life improvement.

Right, but the power savings isn't from the SSDs. It's from the CPU being able to ramp up to full throttle, knock out the workload, and go back to idle significantly faster than if you had been using a spinning drive. With spinning rust you run the risk of going full throttle, idle, full throttle, idle over and over while the CPU processes the data much faster than the platter-based drives can read/write.

Someone wrote up a very detailed article on the topic. I just couldn't find the link in my 1 minute search.

Something · Oct 12, 2015

Ericloewe said:
No, it's a completely different connector/cable system, SFF-8639.

http://anandtech.com/show/9363/sff8639-connector-renamed-as-u2

Ooh! Not quite as hideous as full size USB 3.0 though somethin' makes me wonder if it'll take off in consumer...

jgreco said:
An opportunistic discussion strategy, in other words! ;-) ;-)

It's a legitimatte strategy!

cyberjock said:
Agreed, but 'non-enterprise' loads run async, so the purpose of the slog is gone. So clearly we're talking only about sync loads, which by the same definition means 'enterprise'. Sure, you can run enterprise workloads at home, but that doesn't make them 'non-enterprise' just because you don't use them in a large company. ;)

Oh you...

Yes, enterprise i'm saying as a reference to the type of workload, not the environment of the load.

Right, but the power savings isn't from the SSDs. It's from the CPU being able to ramp up to full throttle, knock out the workload, and go back to idle significantly faster than if you had been using a spinning drive.

Actually I haven't seen much data on that and I was thinking about it as I looked into a new SSD...

That said, the results are clear, SSDs result in much greater efficiency.

With spinning rust you run the risk of going full throttle, idle, full throttle, idle over and over while the CPU processes the data much faster than the platter-based drives can read/write.

I think it's partially lower power usage but majority much faster processing to reduce the drive usage and CPU usage times.

Someone wrote up a very detailed article on the topic. I just couldn't find the link in my 1 minute search.

I'd be very interested, got any thoughts on what it was called?

jgreco · Oct 12, 2015

I don't think that's true any longer. There did indeed used to be some savings because CPU's did not aggressively self-manage power utilization, and often relied on assistance from the OS, which is how things like powerd came to be.

In those days, yes, you kinda wanted to get all your stuff done and then "go idle" if you could.

These days, though, I can start to throw some load on a multicore CPU and observe no measurable difference in power utilization until the load is fairly significant. In that model, you really want to keep a low background level of computing going on at all times, because it is essentially free.

Ericloewe · Oct 12, 2015

jgreco said:
I don't think that's true any longer. There did indeed used to be some savings because CPU's did not aggressively self-manage power utilization, and often relied on assistance from the OS, which is how things like powerd came to be.

In those days, yes, you kinda wanted to get all your stuff done and then "go idle" if you could.

These days, though, I can start to throw some load on a multicore CPU and observe no measurable difference in power utilization until the load is fairly significant. In that model, you really want to keep a low background level of computing going on at all times, because it is essentially free.

It kinda depends on the manufacturing process. If a chip is prone to leakage, a race to idle allows leakier sections to be powered off, saving power.

Power management seems to be yesterday's focus, though. Now, thermal management is all the rage in popular attention.

Ericloewe · Oct 12, 2015

Something said:
Ooh! Not quite as hideous as full size USB 3.0 though somethin' makes me wonder if it'll take off in consumer...

Well, consumer has M.2.

Something · Oct 12, 2015

jgreco said:
I don't think that's true any longer. There did indeed used to be some savings because CPU's did not aggressively self-manage power utilization, and often relied on assistance from the OS, which is how things like powerd came to be.

In those days, yes, you kinda wanted to get all your stuff done and then "go idle" if you could.

These days, though, I can start to throw some load on a multicore CPU and observe no measurable difference in power utilization until the load is fairly significant. In that model, you really want to keep a low background level of computing going on at all times, because it is essentially free.

The impact is definitely lesser as multiple cores with a shared voltage (BUT NOT TURBO/IDLE!) means you can nominally increase voltage to support being off idle in loaded cores. Especially for things like, oh, Haswell-EP 18 cores.

Ericloewe said:
Power management seems to be yesterday's focus, though. Now, thermal management is all the rage in popular attention.

Are we talking about enterprise or consumer? Then again, Haswell has been a pain...stupid FIVR.

Ericloewe said:
Well, consumer has M.2.

Slightly less hideous thankfully.

jgreco · Oct 13, 2015

Power management and thermal management are, of course, intimately linked. And we've learned a lot over the years especially from the mobile device segment. As long as you don't need the processing done Right Damn Now, there's a good incentive to keep the system out of turbo, running lower clock speeds, and doing a lower level of work in the background.

Something · Oct 13, 2015

jgreco said:
Power management and thermal management are, of course, intimately linked. And we've learned a lot over the years especially from the mobile device segment. As long as you don't need the processing done Right Damn Now, there's a good incentive to keep the system out of turbo, running lower clock speeds, and doing a lower level of work in the background.

So it pays to be less greedy? Something about this seems wrong...

Probably just nothing.

HoneyBadger · Oct 13, 2015

Nope, race-to-sleep is awesome. As long as the latency to go from idle to active isn't something that impacts your application workload, it's better to stay idle and burst up when needed.

Something · Oct 13, 2015

HoneyBadger said:
Nope, race-to-sleep is awesome. As long as the latency to go from idle to active isn't something that impacts your application workload, it's better to stay idle and burst up when needed.

Hmmmmmmm I wouldn't mind working in CPU design, would be fun...

Let me try and organize my thoughts, simplifying to keep it nice.

Multiple cores, each core has an individual frequencies, all share a common voltage, higher frequencies ALWAYS costs (exponentially) higher voltage and power usage (even at the same voltage, 1GHz requires less power than 2GHz). There are three operating frequencies: idle, normal and turbo. Higher the frequency, the faster the operation.

We IDEALLY want to spend all of our time idling on all of our cores. Given we have to do some work, the result then is do we want to have the work dispersed as much as possible among many cores or centralized to individual cores? The work is a complicated set of tasks, some are more parallel than others, with varying levels of overhead and varying length to be completed.

I'll come back to this at some point.

jgreco · Oct 13, 2015

Something said:
We IDEALLY want to spend all of our time idling on all of our cores. Given we have to do some work, the result then is do we want to have the work dispersed as much as possible among many cores or centralized to individual cores?

Observationally, my new ESXi VM filer has a fairly flat power utilization;

unfortunately it was down for a downtime much of yesterday and last night but if you look at the "last week" you can see the remarkably flat minimum and average numbers; the higher peak was because I was running a piggy collection script that obviously didn't get restarted after reboot. It doesn't matter if I'm shoving 10Gbps at it or doing nothing, the power utilization is flat. The CPU is a massively oversized-to-the-task E5-1650 v3.

LubomirZ · Oct 18, 2015

from page 2:

>> The larger lithography has nothing to do with the P/E cycles or write endurance, it was for cost cutting

I wonder why nobody was questioning this so far, because lithography has EXTREME influence on P/E cycles. The more we shrink, the less endurance we get and there's no escaping that FACT.

It was some years ago with 5x and 4x nm lithography - back in the SLC era - where we easily punched 100.000 P/E cycles guaranteed. Nothing you can even dream today about with 2x and 1x stuff. We are very very lucky to get 3.000 P/E today in consumer class and 20.000 to 50.000 P/E cycles with expensive enterprise SSDs (this equates to approx. 25 DWPD for 5 years and that really is top-class today). Heck, you get 1 DPWD with some "read oriented enterprise stuff" from Intel and that is, ehm, 5 years x 365 days x 1 DWPD = 1825 P/E cycles, let's count with 3:1 write amplification = 5.500 P/E cycles. Don't tell me it's not like that.

And yes, it is DIRECTLY related to lithography much more than anything else.

Ericloewe · Oct 18, 2015

LubomirZ said:
from page 2:

>> The larger lithography has nothing to do with the P/E cycles or write endurance, it was for cost cutting

I wonder why nobody was questioning this so far, because lithography has EXTREME influence on P/E cycles. The more we shrink, the less endurance we get and there's no escaping that FACT.

It was some years ago with 5x and 4x nm lithography - back in the SLC era - where we easily punched 100.000 P/E cycles guaranteed. Nothing you can even dream today about with 2x and 1x stuff. We are very very lucky to get 3.000 P/E today in consumer class and 20.000 to 50.000 P/E cycles with expensive enterprise SSDs (this equates to approx. 25 DWPD for 5 years and that really is top-class today). Heck, you get 1 DPWD with some "read oriented enterprise stuff" from Intel and that is, ehm, 5 years x 365 days x 1 DWPD = 1825 P/E cycles, let's count with 3:1 write amplification = 5.500 P/E cycles. Don't tell me it's not like that.

And yes, it is DIRECTLY related to lithography much more than anything else.

Yeah, you're right. Samsung quoted much better write endurance on their stacked NAND stuff, because each die was made with a 20-something nm process.

Something · Oct 18, 2015

I don't think endurance and process have a causal relationship. Though certainly in general, as time has gone by, the P/E cycles of the drives have gone down, but the controllers have gotten better and better. Drive sizes going up also doesn't hurt.

LubomirZ · Oct 18, 2015

drive sizes have nothing to do with P/E as P/E is specified "per cell" : if you have 100MB SSD or 16TB SSD and it can do 1.000 P/E cycles, it simply only can do 1.000 program/erase cycles. That means 100MB SSD would be able to write 100MB x 1000 = 100.000 MB = 100GB of data (0.1TBW endurance) and 16TB SSD with 1.000 P/E cycles would be able to write 16TB x 1000 = 16000 TB of data = 16000TBW endurance. Yes, the resulting total endurance is extremely different, but the fact that both drives were able to write 1.000 times IN EACH CELL stays intact and this is exactly what P/E is all about.

Bigger drive has more cells so it can write more data, but each cell only can be written 1.000 times, not more. This didn't change with capacity. Capacity == drive size has absolutely nothing to do with P/E cycles "per se".

Just because controllers got better and better, just because they are able to heavily optimize writes and drive down write amplification, just thanks to that SSD didn't collapse yet. If there would be no improvement in controllers, we would be done by now with customer class.

Endurance and lithography process have very very crucial relationship. This is no secret information, this is not just me saying it - you can find it everywhere on internet. Drives have generally gone down in P/E specification with process shrinks despite of advances in every other aspect and you are fully right - controllers have gotten better and better, much better than what they were years ago. If they wouldn't, oooh I don't want to know.

PS: in order to mask reliability drops, manufacturers are becoming more pushed and "more brave over time". In past, when they specified 100.000 P/E cycles, that thing was easily able to handle LET'S SAY 1.000.000 cycles so 10x more. Today, when they specify 5.000 P/E cycles for any lithography, you won't get 50.000 P/E by any chance - you'll be VERY lucky when you get 10.000 P/E, especially in consumer-class. Why is that ? Shrinking profits, less margins, extreme pressure on price everywhere, faster product cycles and need to satisfy shareholders, nobody really is able to tear down consumer-class SSDs is TYPICAL environments today (and those rare few who are, hell we'll cover them with warranty and replace their torn-apart-beaten-to-death SSDs so they don't start screaming on internet... well it's directly Samsung with 850Pro who says 150TBW but they will individually consider case-by-case if somebody crossed it), stuff like that.

Back to what this topic should be about...

Something · Oct 18, 2015

LubomirZ said:
drive sizes have nothing to do with P/E as P/E is specified "per cell" : if you have 100MB SSD or 16TB SSD and it can do 1.000 P/E cycles, it simply only can do 1.000 program/erase cycles. That means 100MB SSD would be able to write 100MB x 1000 = 100.000 MB = 100GB of data (0.1TBW endurance) and 16TB SSD with 1.000 P/E cycles would be able to write 16TB x 1000 = 16000 TB of data = 16000TBW endurance. Yes, the resulting total endurance is extremely different, but the fact that both drives were able to write 1.000 times IN EACH CELL stays intact and this is exactly what P/E is all about.

Which is my point, that large drives mitigate the damage of lower P/E.

Just because controllers got better and better, just because they are able to heavily optimize writes and drive down write amplification, just thanks to that SSD didn't collapse yet. If there would be no improvement in controllers, we would be done by now with customer class.

Not the point i'm making, random controller failures are becoming less likely with time which was the greater source of failure.

Endurance and lithography process have very very crucial relationship. This is no secret information, this is not just me saying it - you can find it everywhere on internet. Drives have generally gone down in P/E specification with process shrinks despite of advances in every other aspect and you are fully right - controllers have gotten better and better, much better than what they were years ago. If they wouldn't, oooh I don't want to know.

Not everywhere, enterprise stuff won't skimp as much there. Especially higher end, mission critical things.

PS: in order to mask reliability drops, manufacturers are becoming more pushed and "more brave over time". In past, when they specified 100.000 P/E cycles, that thing was easily able to handle LET'S SAY 1.000.000 cycles so 10x more. Today, when they specify 5.000 P/E cycles for any lithography, you won't get 50.000 P/E by any chance - you'll be VERY lucky when you get 10.000 P/E, especially in consumer-class. Why is that ? Shrinking profits, less margins, extreme pressure on price everywhere, faster product cycles and need to satisfy shareholders, nobody really is able to tear down consumer-class SSDs is TYPICAL environments today (and those rare few who are, hell we'll cover them with warranty and replace their torn-apart-beaten-to-death SSDs so they don't start screaming on internet... well it's directly Samsung with 850Pro who says 150TBW but they will individually consider case-by-case if somebody crossed it), stuff like that.

Back to what this topic should be about...

The SSD market is very competitive but profits aren't declining. Margins are as volume rises.

HankC · Oct 23, 2015

does freenas 9.3.1 support nvme now?

jgreco · Oct 23, 2015

As does FreeNAS 9.3.

HankC · Oct 23, 2015

i mean there was problem with it as l2arc or something. needed something so that it doesn't hung in startup

Important Announcement for the TrueNAS Community.

NVMe support?

Resident Grinch

Inactive Account

Explorer

Resident Grinch

Server Wrangler

Server Wrangler

Explorer

Resident Grinch

Explorer

actually does care

Explorer

Resident Grinch

Dabbler

Server Wrangler

Explorer

Dabbler

Explorer

Dabbler

Resident Grinch

Dabbler

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "NVMe support?"

Similar threads