BUILD Verification and discussion about new FreeNAS-System

Status
Not open for further replies.

Globalhawk

Dabbler
Joined
Jun 15, 2015
Messages
10
Hey guys,

for first I want to thank you all here for your work and time you are investing to solve problems and answer questions.

My motivation is to have the full control about all my data and additional my data have to be consistent and redundant. I want to use FreeNAS to save sensitive and non-sensitive data. The system should be upgradable to a maximum of 11-14 physical hard drives. (In this planned server)

So, to enable the highest possible reliablility for me I want to create a ToDo-List together with you to verify the consistence of the whole DATA-System. I'm reading this forum and will add point for point I'm finding. I know that I'm not able to create a trustworthy ZFS-System for myself right now. But I'm still in the rapid learning curve.

What do I expect from the system?

I think I'm fine with at least 12TB of usable space in the beginning. In my opinion (and to reduce resilvering time) mirroring Vdevs is my favourite option where I'm feeling comfortable with. I really want a reliable system. I don't want to worry about my data NEVER NEVER NEVER.

If I know about / trust the encryption technology which FreeNAS is offering I want to use this option (therefore I'm considering AES-NI support in my CPU decision). Also compression should be activated. My data will be home-user stuff with streaming-videos, work-data with design related stuff (PSD, InDesign, photos and material) but also experimental in the future like Plex or smth else.

My hardware:

Chassis
This chassis is well known and I've got a good shot on eBay.
SuperMicro CSE-933T-R760B

Mainboard
A good board for my expectations and needs I think. I can flash the built-in (IT-flashable) LSI SAS controller. My requirements are the ability to administrate everything remotely. So IPMI/iDrac/ILO is a must. It has Intel-NICs and a bunch of S-ATA ports.
Supermicro X10SL7-F (MBD-X10SL7-F-O)

Processor
This is the last CPU without internal graphics and should be enough single-core-power for SMB. (Last/highest CPU in this model which have a reasonable price for me):
Intel Xeon E3-1241v3 (8M Cache, 3.50 GHz)

Disk Drives
A price and capacity i'm feeling okay with. Also well tested and for 24/7 constructed.
6x WD Red 4TB (WD40EFRX)

RAM
Due to RAM Recommendations and titled good-known modules escpecially for the choosen mainboard:
4x 8GB Samsung M391B1G73QH0-YK0 DDR3L

OS-Drives
I've searched some time for a reliable booting system. I don't want to invest a S-ATA Port for SATA DOM boot device, should I? Some people here working well with this USB flash. The System will be mirrored onto both sticks.
2x SanDisk Cruzer Fit USB Flash Drive

Okay this is my planned hardware. Now I have to think about the configuration and after that about the testing/experimenting and verification process to be sure that I really want to safe my whole data on this machine.

The very first step I will do is to do a favor to myself: BACKUP. The second more important step is also for myself: verify that this backup is really recoverable and consistent.

My Data have been backuped and I'm personally sure that the data can survive a apocalypse without loosing any of my data for now. I think i'm ready to check the hardware.

Logical thinking about my ZFS-Storage:

2-Way Mirror with Striping
As I mentioned I want to have a fully reliable storage. I'm okay with an storage efficiency of only 50% due to the mirroring. Additional I want to add some performance so that the journey will end in a Striped Mirrored Vdev Zpool. Please confirm my thinking about this config:

VDEV1 -> A/B = mirror
VDEV2 -> C/D = mirror
VDEV3 -> E/F = mirror

ZPOOL1 -> VDEV1, VDEV2, VDEV3 -> 12TB usable space (right calculation with /1024 left out)

With this configuration the data is safe with 3 failed HDDs if the fails are in different VDEVs. The whole pool will be destroyed if 2 HDDs of one VDEV will fail. Right? The striping will go over HDDs A,C and E.

It will not be possible to add another HDD to each VDEVs so that I cannot create a tripple mirroring after the first setup. But it is possible to add another VDEVn into that ZPOOL1, right?​

RAID-Z2
The more I've read about 2-way mirror with striping the more I'm unhappy about the fact that if two of one VDEV HDDs will fail, the whole data will be lost.
So i'm thinking also about a RAID-Z2 constellation. The maximum failrate of 2 HDDs but no matther which ones may be better. What should I consider for the decision? Please confirm if that logic would be fine:

VDEV1 -> A/B/C/D/E/F

ZPOOL1 -> VDEV1 -> 16TB usable space (right calculation with /1024 left out)

If i'm going this way it's possible to withstand 2 failed HDDs. With only one VDEV I don't have to sacrifice more HDDs than 2 instead of creating 2 VDEVs (and "loose" 4 HDDs) . The disadvantage of this concept is the increased resilvering-time. But I need your knowledge at this point.
What are the most important differences between this both configurations ? What am I missing here?​

If I'm accidentially add a VDEV (with only one HDD in it) into my ZPOOL I'm loosing the redundancy completely. And due to the fact that I can't remove any VDEV out of the ZPOOL, I have to migrate the whole data onto a new ZPOOL, right?

Could you please confirm: In the case that I have to replace a failed HDD the right steps (best practise) are:
  1. Set the HDD offline inside of FreeNAS
  2. Wait for confirmation of the system that the disk have been marked as offline
  3. Shut-Down the machine (just to be safe)
  4. Remove the failed HDD
  5. Insert the new (at least same sized or bigger HDD but suggested SAME Model, Firmware, etc.) HDD
  6. Start the machine
  7. Mark the HDD as online (?)
  8. Resilvering progress
ToDo-List before installation of FreeNAS and after mounting the hardware:
  • [DONE] - Update BIOS / Firmware of Mainboard / IPMI
  • [DONE] - Configure IPMI
  • [DONE] - Update Firmware of all hard drives
  • [DONE] - Flash LSI to IT-Mode (here)
  • [DONE] - Activate AHCI in BIOS for hot plug ability
  • [DONE] - Activate power ON after power loss
  • [DONE] - Deactivate instant shutdown after pressing the power button
  • [DONE] - Start MemTest for at least 48 hours
  • View at system and especially HDD temperatures
ToDo-List after Installation of FreeNAS but before moving my data to the machine:
  • [DONE] - SMART conveyance test, don't know how long
  • [DONE] - SMART long/extended test
  • BurnIN-Phase (~1 week)
    • [DONE] Some reboots of the machine
    • [DONE] Individual sequential read and write tests.
      • dd if=/dev/da${n} of=/dev/null bs=1048576" to do a read test, and "dd if=/dev/zero of=/dev/da${n} bs=1048576" to do a write test
      • Simultaneous sequential read and write tests
    • [DONE] Running jgreco's script from ftp://ftp.sol.net/incoming/solnet-array-test-v2.sh
    • Running Iozone in a seek-heavy manner (with incompressible test data)
  • [DONE] View at system and especially HDD temperatures
  • Measurements about wattage / power consumption
To-Do-List after moving my data to the machine (maintenance):
  • Schedule periodic long SMART tests (every 14 days)
  • Schedule periodic short SMART tests (every 3rd day)
    (Memo to myself: Scrub and a SMART Long test will NEVER be run at the same time. This can cause scrubs to never end.)
  • Schedule Scrubs
  • Schedule backups
  • Schedule config database backups
  • Add machine to monitoring-server via snmp
  • Configure mail status messages
  • Implement Gnuplot scripts to visualize collected IOZone-Data
That's a transcript for myself and others who are planning a new system. I'm hoping that there would be some answers and completion for me from you.

At least, I won't be a statistic!

Best
Globalhawk

EDITED & ADDED
  1. Do I have to modifiy the behavior of the drive to wait longer before positioning the heads in their park position and turning off unnecessary electronics?
  2. It seems that we aren't able to update any firmware of WD RED. I've talked to technical support germany and worldwide of western digital. There is no official option to update the firmware.
  3. If you use this mainboard with that kind of chassis, you need an PCI-E 8-pin extension cable for your mainboard like THIS.
    Additional you need a front panel motherboard cable due to incompatibilities. But there is a official solution for this case. The header pins are the same (compared to the mainboard of the other user as I mentioned before). CBL-0084L is the official model-no. for the split-cable. You also can use CBL-0068L but it can be difficult to replace the original 16-pin flat cable with CBL-0068L split cable because the original cable is routed underneath the chassis fans.
    If you need some additional screws like me and want plate for your HDD-slots, you can use MCP-410-00005-0N.
 
Last edited:

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
I won't comment on your specific hardware choices - others are much better qualified.

Unless you absolutely require full pool encryption, you might want to avoid the potential pitfalls of that and just encrypt the data that needs encryption on your client system(s). If you must have encryption, practice things like drive replacement in a VM to make sure you know exactly how to handle it.

Your understanding of the tradeoffs between striped mirrors and RAIDZ2 matches mine. Unless your workload requires high IOPS performance, which you haven't indicated it does, I'd stick with RAIDZ2 for the combination of higher capacity and greater resiliency. I believe resilver time is mostly dependent on the write performance of the new disk, so it won't be faster on striped mirrors.

SMART conveyance test takes longer than short test, but nowhere near as long as extended test, and it's a one-time thing.
  1. Not with current generation WD Reds. They are expected to be always-on, and the default power saving setting for drives in FreeNAS is always on with power saving disabled.
  2. OK.
  3. I have no idea.
 
Last edited:

Globalhawk

Dabbler
Joined
Jun 15, 2015
Messages
10
Hey Robert,

Unless you absolutely require full pool compression, you might want to avoid the potential pitfalls of that and just encrypt the data that needs encryption on your client system(s). If you must have encryption, practice things like drive replacement in a VM to make sure you know exactly how to handle it.

you've mixed compression and encryption into one sentence. What do you mean, fully compressed pool or fully encrypted pool relating to the pitfalls ?

  1. Not with current generation WD Reds. They are expected to be always-on, and the default power saving setting for drives in FreeNAS is always on with power saving disabled.

So the internal power-saving features aren't activated or just should be higher than the options defined inside of FreeNAS?

---

I've added some informations related to the chassis. There are some special requirements if you want to use current mainboards with it. But that problems costs me only 10€ more. You just have to know it.

Could anyone tell me something about the whole system itself? Are my thoughts right thoughtful? Thanks in advance.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,456
Compression is enabled by default and is safe. Encryption is not enabled by default, and has some potential pitfalls. I believe Robert was discussing encryption.
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
So the internal power-saving features aren't activated or just should be higher than the options defined inside of FreeNAS?
My understanding is that some early WD Reds had the 'head park when idle' setting enabled and were parking the heads after a few seconds of idle, which led to extremely high load cycle counts in the SMART attributes. This setting is not adjustable from within FreeNAS, but there is a utility you can download from wdc.com (it's also on the Ultimate Boot CD). This is no longer an issue with currently shipping WD Reds. I don't think there's any other power saving setting that you can adjust directly on the drives.

The usual recommendation is not to enable any of the drive power saving options from within FreeNAS.
 

Globalhawk

Dabbler
Joined
Jun 15, 2015
Messages
10
Yes, i know about that official tool from WDC. I've interpreted your post that every factory defaults are defined that 'head park when idle' is disabled. But I think that I have to deactivate that option manually.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,175
Some Reds manufactured around December 2013 shipped with their idle timers set to Green standard (8 seconds instead of 300). There's a tool specifically for them and wdidle also works with them. The problem was quickly fixed and by late summer the supply chain seemed to not have significant amounts of affected drives.
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
I've interpreted your post that every factory defaults are defined that 'head park when idle' is disabled.
I don't know that it's disabled, but the problem of it being absurdly short seems to have been solved. Here's a WD Red that I recently installed:
Code:
smartctl 6.3 2014-07-26 r3976 [FreeBSD 9.3-RELEASE-p13 amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Red
Device Model:     WDC WD20EFRX-68EUZN0
Serial Number:    WD-
LU WWN Device Id: 5 0014ee 26065a667
Firmware Version: 82.00A82
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Sat Jun 20 15:18:56 2015 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)    Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)    The previous self-test routine completed
                    without error or no self-test has ever
                    been run.
Total time to complete Offline
data collection:         (25980) seconds.
Offline data collection
capabilities:              (0x7b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:      (   2) minutes.
Extended self-test routine
recommended polling time:      ( 263) minutes.
Conveyance self-test routine
recommended polling time:      (   5) minutes.
SCT capabilities:            (0x703d)    SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   175   175   021    Pre-fail  Always       -       4225
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       6
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       652
10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       6
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       3
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       17
194 Temperature_Celsius     0x0022   118   112   000    Old_age   Always       -       29
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%       517         -
# 2  Short offline       Completed without error       00%       349         -
# 3  Extended offline    Completed without error       00%       187         -
# 4  Short offline       Completed without error       00%       181         -
# 5  Extended offline    Completed without error       00%         5         -
# 6  Conveyance offline  Completed without error       00%         0         -
# 7  Short offline       Completed without error       00%         0         -

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

You'll notice that the Load_Cycle_Count is higher than the the Power_Cycle_Count, but it's not unduly high relative to the Power_On_Hours.
 

rogerh

Guru
Joined
Apr 18, 2014
Messages
1,111
AFAIK you can add extra drives to mirrors. Clearly you would be adding a third drive to each for greater redundancy while still keeping the higher speed of mirrors. Your use case suggests RAIDZ2 would probably be OK for you and give a lot more space, but, as I say, you could make the mirrors 3 way in the future.
 

Globalhawk

Dabbler
Joined
Jun 15, 2015
Messages
10
Okay, so here is my status so far:

I've bought the hardware as I told above. The hardware is up and running but without any connection onto the front panel. Reason for that is because I'm still waiting for my ordered split-cable. Currently the system is in test-status so the first test of memtest was running about 48 hours and my test now is running for 39h without any errors.
I've found out that I can use the software IPMI View (v2.10.2 (build 150203)) with Windows 8.1 only in compatibility-mode for Windows Vista (with this mode there are no problems any more in the IMPI v 1.92 but with older versions of IPMI (with the KVM)).

Here are some pictures of the build:

OIotsB0.jpg

tdOmYpY.jpg

5LoVdMy.jpg

16IiAdO.jpg

bHXZcyR.jpg

zSmBl9H.jpg

Here I do have a question. I'm feeling not really well with this angle of the S-ATA-Cables. What do you think?

92KcGgr.jpg


qr8KEHk.png

First memtest, second currently at 39h running

If I have time in the evening I want to flash the IT-Mode and install FreeNAS to move to the SMART-Checks. Can anyone answer my questions in my initial post? Thanks a lot.
 

Globalhawk

Dabbler
Joined
Jun 15, 2015
Messages
10
One another urgent question. I know that I have to address every ram problem. So I saw that error in BIOS when I turned on the machine for the first time. After the first 48h of memtest I've checked all event logs again but there was no another error. So what do you think? Should it be replaced ?

8Ah78sf.png
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
The bend radius of the SATA cables seems ok to me ;)
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,970
Jumping back a few postings lets discuss the WD Timer thing... I would recommend you set all the drives to 300 seconds (5 minutes). Look for the WD Parking thread and I've made some postings on it yesterday which I believe will clear up any questions you have about it.

As for the pool you plan to create, since you are concerned about drive failures taking out your pool why not consider a RAIDZ3 setup. With your current drives that would leave you with 10.9TB of usable space without compression and any 3 drives can fail and you still retain your data. Also your calculations are off, use the calculator in my signature to get something more accurate. I haven't done any mirroring of drives before so I'm not sure what the final capacity will be but if you create three pairs of mirrored drives and then make a RAIDZ1 pool out of those, you should end up with 7.3TB of usable data without compression factored in, but I might have that wrong since I've never used mirrors.
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
if you create three pairs of mirrored drives and then make a RAIDZ1 pool out of those
I'm pretty sure you can't do that, since it would imply child vdevs inside a parent vdev, and that doesn't match my understanding of ZFS structure.
So what do you think? Should it be replaced ?
Tough question. I would definitely RMA a DIMM that consistently threw errors, but one time only could be a tough sell.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,970
I'm pretty sure you can't do that, since it would imply child vdevs inside a parent vdev, and that doesn't match my understanding of ZFS structure.
Yea, I'm probably wrong, that is why I said I have no mirror experience. I should read up on it just out of curiosity.

Tough question. I would definitely RMA a DIMM that consistently threw errors, but one time only could be a tough sell.
Should be no issue RMA'ing a memory stick, they are for lifetime in general. However I'd first swap the RAM chips around and note where each one was moved to. Run the MemTest test for 3 days, if you still are pulling errors, if those errors are related with the same DIMMB1 location then it's a MB issue. If it moves with the DIMM then it's the DIMM. Due to compatibility issues you may need to underclock the RAM slightly as well to get everything to pass which isn't ideal but it is what it is.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,175
One another urgent question. I know that I have to address every ram problem. So I saw that error in BIOS when I turned on the machine for the first time. After the first 48h of memtest I've checked all event logs again but there was no another error. So what do you think? Should it be replaced ?

8Ah78sf.png
Multi-bit error? Did the server crash at that time?

In any case, try to isolate the problem and replace any repeat offenders. Definitely don't skimp on burn-in after this.

Okay, so here is my status so far:

I've bought the hardware as I told above. The hardware is up and running but without any connection onto the front panel. Reason for that is because I'm still waiting for my ordered split-cable. Currently the system is in test-status so the first test of memtest was running about 48 hours and my test now is running for 39h without any errors.
I've found out that I can use the software IPMI View (v2.10.2 (build 150203)) with Windows 8.1 only in compatibility-mode for Windows Vista (with this mode there are no problems any more in the IMPI v 1.92 but with older versions of IPMI (with the KVM)).

Here are some pictures of the build:

OIotsB0.jpg

tdOmYpY.jpg

5LoVdMy.jpg

16IiAdO.jpg

bHXZcyR.jpg

zSmBl9H.jpg

Here I do have a question. I'm feeling not really well with this angle of the S-ATA-Cables. What do you think?

92KcGgr.jpg


qr8KEHk.png

First memtest, second currently at 39h running

If I have time in the evening I want to flash the IT-Mode and install FreeNAS to move to the SMART-Checks. Can anyone answer my questions in my initial post? Thanks a lot.

Meh, I've seen worse than those bends.

Do note that the FANA header is not controlled by the CPU temperature - only FAN1-FAN4 are (for detail's check out GrumpyBear's thread on the matter, linked from the Supermicro X10 FAQ).

Do you have any other specific questions? To be honest, the OP is a bit too long for me to read thoroughly at the moment. :p
 

Globalhawk

Dabbler
Joined
Jun 15, 2015
Messages
10
The server doesnt crashed at any time. Ive just restarted the computer serveral times same as now but i saw another multi ecc error in the BIOS today. Now I've changed DIMMB1 with DIMMA1 and let the memtest run another few days. Very frustrating that behavior. (I havent exited memtest correctly, just restarted the computer. But I think there would be no wrong time in the error log if that would be the reason. (I resetted the computer at 20:45, the log shows the error on 17:54))

nJQ1R37.png

Xac7aAe.png
 

maglin

Patron
Joined
Jun 20, 2015
Messages
299
Did you buy that Samsung RAM that you mentioned in your first post? I would probably return it and get the Crucial RAM mentioned in one of the stickies. It's what everyone seems to put in their X10 machines around here with great success and is what I just purchased for my X10SLL.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,175
That Samsung RAM is also on the QVL, so incompatibilities should be regarded as defects.

That said, the popular Crucial also works very well, in most cases.
 
Status
Not open for further replies.
Top