Broke yet another SSD - 5 and counting.

Status
Not open for further replies.

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Not that unreasonable. Many years ago, I made what was probably the first appliance version of FreeBSD, which was later taken and made into PicoBSD. Booted off a 1.44MB floppy onto a 486 box with 16MB RAM and was used to make old PC's into X-terminals running FreeBSD and XFree86.

I actually really wanted something akin to the Compaq iPAQ IA-1 (which I seriously looked at at the time) to act as a digital contacts/calendar keeper for places where having a computer might not be practical. These days I just recycle old Android tablets to that task. :smile:
 

tvsjr

Guru
Joined
Aug 29, 2015
Messages
959
I would not use SSDs as any part of a NAS.

My advise is to only use your NAS as storage, forget messing about cache drives, jails and all that rubbish.

Have a good main PC and just treat your NAS as a useful and stable storage device.

Wow, you've got it all figured out. I'd recommend a high-paid consulting job with EMC, NetApp, Dell/Compellent, etc. since they are clearly making mis-steps by building not only hybrid NAS/SAN devices, but some that are 100% flash/SSD-based!
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Booted off a 1.44MB floppy onto a 486 box with 16MB RAM and was used to make old PC's into X-terminals running FreeBSD and XFree86.
Damn,. that is a hefty amount of hardware for a terminal. I recall taking a 720k 5.25" floppy, cutting a tab in it and making it a double sided floppy, or how about the old 3.5" floppy and cutting a hole in the plastic case to make it a 1.44MB floppy disk. Then I'd insert it into a computer with a whole 512KB of RAM (had to populate the motherboard with lots of small DIP chips), lots of processing power! Back in those days I actually had a copy of Windows 1.0 and it worked okay for what it was. I'd never heard of FreeBSD or Linux but I did hear of CPM and UNIX. I led a sheltered life.

Getting back to the original topic, I'd never use a SSD as an L2ARC on a system will less than 64GB of RAM, much discussion have gone on about this and typically it actually is detrimental to the throughput. Also, I'd never use a consumer grade MLC SSD for this use, I'd use an SLC SSD.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Damn,. that is a hefty amount of hardware for a terminal. I recall taking a 720k 5.25" floppy, cutting a tab in it and making it a double sided floppy, or how about the old 3.5" floppy and cutting a hole in the plastic case to make it a 1.44MB floppy disk. Then I'd insert it into a computer with a whole 512KB of RAM (had to populate the motherboard with lots of small DIP chips), lots of processing power! Back in those days I actually had a copy of Windows 1.0 and it worked okay for what it was. I'd never heard of FreeBSD or Linux but I did hear of CPM and UNIX. I led a sheltered life.

Well, the RAM was needed because if you've got a UNIX system that literally has nowhere to swap to, you better not run out. X-terminals were basically the first generation of what today are known as "thin clients", and were a practical way to tap the multiuser capabilities of a networked UNIX box. I wanted to have a bunch of graphically capable X-terminals all attached to our shell host located around the facility, but I wasn't really willing to pay $1K+ for a bunch of NCD's. The 486's were at the time being retired, since the Pentium 133 with EDO RAM was so much nicer for a desktop box, so I'd take the 8GB FPM out of two 486's, put both back into one of them, and discard the other 486 :smile: YAY, free base hardware, just had to come up with monitors, keyboards, and mice. I still have a bunch of Mouse Systems serial optical mice with the mirror-style pads around in stock somewhere, ha.
 

Crispin

Explorer
Joined
Jun 8, 2011
Messages
85
folks, as if by magic, I toasted another SSD.
Off the back of this post I removed the failed SSD and put another one in. Perhaps August/September of there abouts.
Last night I get a SMART warning saying SSD is failing. I laughed. :)

This disk has had a pretty easy life. The last was in place when I moved all the data back onto new build and moved data around internally so it might have been written to a lot. This one though, it only has to deal with a nightly backup being written to tank and a rar of about 19GB of files.

Surely I cannot cook SSDs in a NAS in 3 months?

SMART report:

Code:
########## SMART status report for ada1 drive (SandForce Driven SSDs: 10316511580009990080) ##########

Error SMART Status command failed
Please get assistance from
http://smartmontools.sourceforge.net/
Register values returned from SMART Status command are:
CMD=0xb0
FR =0xda
NS =0xffff
SC =0xff
CL =0xff
CH =0xff
RETURN =0x0000
SMART overall-health self-assessment test result: FAILED!
No failed Attributes found.

ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 117 099 050 Pre-fail Always - 0/167471433
5 Retired_Block_Count 0x0033 095 095 003 Pre-fail Always - 416
9 Power_On_Hours_and_Msec 0x0032 100 100 000 Old_age Always - 25443h+15m+50.850s
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 363
171 Program_Fail_Count 0x0000 000 000 000 Old_age Offline - 0
172 Erase_Fail_Count 0x0000 000 000 000 Old_age Offline - 0
174 Unexpect_Power_Loss_Ct 0x0030 000 000 000 Old_age Offline - 329
177 Wear_Range_Delta 0x0000 000 000 --- Old_age Offline - 1
181 Program_Fail_Count 0x0000 000 000 000 Old_age Offline - 0
182 Erase_Fail_Count 0x0000 000 000 000 Old_age Offline - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
194 Temperature_Celsius 0x0022 028 044 000 Old_age Always - 28 (Min/Max 0/44)
195 ECC_Uncorr_Error_Count 0x001c 117 099 000 Old_age Offline - 0/167471433
196 Reallocated_Event_Count 0x0033 100 100 003 Pre-fail Always - 0
231 SSD_Life_Left 0x0013 093 093 010 Pre-fail Always - 1
233 SandForce_Internal 0x0000 000 000 000 Old_age Offline - 10944
234 SandForce_Internal 0x0000 000 000 000 Old_age Offline - 9152
241 Lifetime_Writes_GiB 0x0032 000 000 000 Old_age Always - 9152
242 Lifetime_Reads_GiB 0x0032 000 000 000 Old_age Always - 15360


 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Well, almost three years of life, 9TB written, that works out to about 9GB written per day, which is pretty heavy use for a consumer SSD. Intel was rating its consumer drives at 20GB/day three years ago, IIRC.
 

tvsjr

Guru
Joined
Aug 29, 2015
Messages
959
Have you made any major changes to your system? Added more RAM (still at 16GB?)? Switched over to enterprise-grade SSDs that are better suited to this purpose?
 

Crispin

Explorer
Joined
Jun 8, 2011
Messages
85
Well, almost three years of life, 9TB written, that works out to about 9GB written per day, which is pretty heavy use for a consumer SSD. Intel was rating its consumer drives at 20GB/day three years ago, IIRC.
Well, ok then :)
Yes it's nearly 3 years old. I guess it does get written to a lot then. If every download, unpacking and compressing on the spindles also pushes that to the SSD then sure - it is terrible life for the SSD.

I think I'll call it quits on the SSD with this setup. I serves no real purpose. It's a large file store which does not really benefit from the cache.

Have you made any major changes to your system? Added more RAM (still at 16GB?)? Switched over to enterprise-grade SSDs that are better suited to this purpose?

Nope, still 16GB (max for NL54). When I build a new one which is more suited to SSDs then I shall just go commercial.


I would draw the conclusion though that home users who use the server for storage / downloading via jails / backups / other non-repetitive work should not use an SSD as they just plain break :(

Perhaps at some point we will be able to tell ZFS to only use L2ARC for datasets xyz...


thanks for the help ya'll
 

tvsjr

Guru
Joined
Aug 29, 2015
Messages
959
I would draw the conclusion that, if you don't have sufficient RAM causing cache thrashing, and you continue to use drives that aren't recommended due to their low (relative) lifetimes... and you push said drives beyond their manufacturer-specified lifetime... then yes, they'll fail.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Perhaps at some point we will be able to tell ZFS to only use L2ARC for datasets xyz...

SSH, zfs set secondarycache=none/meta/all datasetname, done.

The default is "all" so you'll want to set it to "none" or "meta" for the sets to exclude from L2ARC. I wouldn't expect it to be a magic bullet but it should mitigate the damage.
 

Crispin

Explorer
Joined
Jun 8, 2011
Messages
85
I would draw the conclusion that, if you don't have sufficient RAM causing cache thrashing, and you continue to use drives that aren't recommended due to their low (relative) lifetimes... and you push said drives beyond their manufacturer-specified lifetime... then yes, they'll fail.
You put it better than I ;)

On a serious note though - and this is from a curiosity PoV - I have 16GB of RAM but never seem to see it get used. At the moment it's at using 1GB (1019.92MB). It never really climbs. On the same token, the L2ARC is sitting at 45GB of a 60GB SSD which seems ok.
The hit ratio for ARC is 96% while the L2ARC is only 22%.

Could this be a clue as to why I am hammering the disk?

I'm probably delving more into the RTFM with performance tuning though :)


SSH, zfs set secondarycache=none/meta/all datasetname, done.

The default is "all" so you'll want to set it to "none" or "meta" for the sets to exclude from L2ARC. I wouldn't expect it to be a magic bullet but it should mitigate the damage.
Cool, thanks.

I will try it out on another build I am planning. This one has important stuff on so would rather now leave it as-is and working just fine :)


Cheers,
Crispin
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
I have 16GB of RAM but never seem to see it get used. At the moment it's at using 1GB (1019.92MB). It never really climbs.

So the ZFS tab on the Reporting page shows only 1GB of ARC used? That's definitely abnormal. Does arcstat.py from a command line agree with that?

Command would be arcstat.py -f arcsz,l2size

Also run/check @Bidule0hm 's ARC statistics script here:

https://forums.freenas.org/index.php?threads/what-are-your-arc-statistics.28122/

Post the results, I'm betting your ghost values are huge.
 
Last edited:

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
I would draw the conclusion though that home users who use the server for storage / downloading via jails / backups / other non-repetitive work should not use an SSD

Correct conclusion, but not really for the stated reason. L2ARC is primarily beneficial where you have a situation where the pool is sufficiently busy that the requests being made could not be fulfilled expeditiously from the pool drives. This basically breaks down into only a few subcategories, including, (a), extremely busy pool, and (b), high fragmentation causing lots of seeks on the pool (which tends to decimate sequential read performance). A single home user isn't likely to generate enough traffic to truly warrant the L2ARC, and the modest reduction in service times you get is probably not worth the additional cost in RAM and slagged SSD's.

For example, for the 14TB VM storage pool here, I plan to cap usage to 50% (7TB), and after experimenting for awhile, I decided that 64GB of RAM and 256GB of L2ARC was probably sufficient to the task. However, I went and bumped it up anyways to 128GB of RAM and 768GB of L2ARC, and over the last day or two running a small handful of VM's it has settled in at 240GB of L2ARC used. I'm seeing a 23% L2ARC hit ratio which is rapidly growing, and to me that means that the L2ARC is serving the purpose fairly well. The Samsung 950 Pro 512GB has a specced endurance of 400TBW so I'm aiming to minimize thrashing by

I would draw the conclusion that, if you don't have sufficient RAM causing cache thrashing, and you continue to use drives that aren't recommended due to their low (relative) lifetimes... and you push said drives beyond their manufacturer-specified lifetime... then yes, they'll fail.

Or you can be abusive in the manner that some of us are; the original purpose of "RAID" was Redundant Array of Inexpensive Disks, and two of something cheap can sometimes be better than one of something expensive. But that probably doesn't apply here.
 

Crispin

Explorer
Joined
Jun 8, 2011
Messages
85
So the ZFS tab on the Reporting page shows only 1GB of ARC used? That's definitely abnormal. Does arcstat.py from a command line agree with that?
yup, it does:
arcsz l2size
977M 47G


  • Put your data type(s) here...
  • 10:02PM up 2 days, 2:35, 2 users, load averages: 0.83, 0.59, 0.35
  • 2.00GiB / 14.9GiB (freenas-boot)
  • 12.6TiB / 21.8TiB (tank)
  • 985GiB / 1.81TiB (usbbackup01)
  • 1013.22MiB (MRU: 64.18MiB, MFU: 959.82MiB) / 16.00GiB
  • Hit ratio -> 96.59% (higher is better)
  • Prefetch -> 84.94% (higher is better)
  • Hit MFU:MRU -> 95.51%:1.70% (higher ratio is better)
  • Hit MRU Ghost -> 0.29% (lower is better)
  • Hit MFU Ghost -> 1.09% (lower is better)

@jgreco - I appreciate that the with the VM scenario this type of setup would be beneficial.

C
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
By what, man? Don't leave us hanging!

Minimize thrashing by leaving you hanging, I guess. :smile:

My guess is that I was going to complete that sentence by saying "cramming as much L2ARC as I can in the box". It seems likely that I'll be able to cache the working set of the filer in production within 1TB of L2ARC (1/7th of the pool).
 
Status
Not open for further replies.
Top