New Core build advise needed

dj423

Dabbler
Joined
Feb 25, 2023
Messages
29
Well after trying Scale 22.12 and finding some challenges, I have decided I should probably look at TrueNAS core. No data lost, just some oddities with my use case. I am posting this after reading the hardware guide, since I realize this may not be ideal. I am looking for constructive criticism.

So here is the hardware I am running:
Motherboard: Gigabyte B550m ds3h (Rev 1.3) Firmware: F15
CPU: Ryzen 5600g (with the stock cooler)
RAM: 32G of DDR4 - (non-ECC, yes I know the risks - and plan on swapping them out soon)
Storage:
- Boot: 2x M.2 256G SSDs by Silicon Power, in a QNAP 2.5" SATA "RAID" adapter, In RAID1 Mirrored config
- Pool: 4x NVME Patriot 512G P300 drives, (2 of which are on a PCIE 8x riser board, 2 are directly on the main board)
- 1TB total of pool storage in a raid-z2 pool
Network: Intel x520-DA2 10Gb SFP+, connected via om3 fiber, to a MikroTik CRS310 switch (On board 1GB Realtek - for OOB, or I may disable it entirely)
Power: TFX 300W (the only one I could find to fit the SFF Lenovo case it all lives in)

Add in cards: The NIC, and a x16 nvme riser card for the nvme drives. No GPU of course.

My use case: I host around 64 LXD (sometimes referred to as "Linux containers") system containers that run various web apps, email, API's, and hosted WebDAV storage. Along with a Ansible/cloud-init Lab that I use to test playbooks and configuration scripts that eventually end up in production. I run them in Debian 11 VM's that are hosted on XCP-ng 8.2, running on hosts with the same Intel x520 NICs. The VM hosts have 24GB of RAM, with local nvme storage they boot off of. So I don't sling a ton of data around, just need it to be fairly performant and stable for the most part. For my shared storage today, I run off of an Ubuntu 20.04 LTS server, hosting NFS and SMB shares, with ext4 file systems.. With another Debian box that I use for online backup. That box is mirrored to S3 block storage for the off-site backups, over a 1G uplink to the edge router.

Is the hardware I am using going to work for TrueNAS? Or should I toss it all aside and build something more robust? I have access to other hardware, but most of it is older enterprise surplus grade gear. I felt it would be better to start with all new hardware for a more solid system - and it was what I could find, with all the so called "shortages" of late. But I am not opposed to stepping up, I want to build something that will last a while. I am on the search for ECC RAM, but so far that is the only big no-no I see in the mix.

My goal is to have a solid base for shared storage, to host my groupware collaboration platform, email, and host the few VM's and containers that I run 24/7. This is all in a home-lab, so I try to keep the total power under the 200W range, and this server pool is fairly quiet. All hosts and the NAS will be connected to the core switch, with trunked ports, so everything will be L2 segmented via VLAN's. If I missed any detail, feel free to ask.

Thanks in advance.
 
Last edited:

dj423

Dabbler
Joined
Feb 25, 2023
Messages
29
Some new developments:
Apparently the nvme drives aren't happy?

On a fresh Core 13.0-U4 I am seeing alerts:

Device /dev/gptid/6082.... is causing slow I/O on pool​

Here is smartctl output on one of the drives (all 4 drives showing this alert in the first hour)
Code:
smartctl 7.2 2021-09-14 r5236 [FreeBSD 13.1-RELEASE-p7 amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       Patriot M.2 P300 512GB
Serial Number:                      P300EDCB22122903197
Firmware Version:                   400fAA12
PCI Vendor/Subsystem ID:            0x1e4b
IEEE OUI Identifier:                0x000000
Total NVM Capacity:                 512,110,190,592 [512 GB]
Unallocated NVM Capacity:           0
Controller ID:                      0
NVMe Version:                       1.4
Number of Namespaces:               1
Namespace 1 Size/Capacity:          512,110,190,592 [512 GB]
Namespace 1 Formatted LBA Size:     512
Local Time is:                      Fri Mar 10 20:50:35 2023 EST
Firmware Updates (0x14):            2 Slots, no Reset required
Optional Admin Commands (0x0006):   Format Frmw_DL
Optional NVM Commands (0x001f):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat
Log Page Attributes (0x03):         S/H_per_NS Cmd_Eff_Lg
Maximum Data Transfer Size:         512 Pages
Warning  Comp. Temp. Threshold:     90 Celsius
Critical Comp. Temp. Threshold:     95 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     6.50W       -        -    0  0  0  0        0       0
 1 +     5.80W       -        -    1  1  1  1        0       0
 2 +     3.60W       -        -    2  2  2  2        0       0
 3 -   0.0500W       -        -    3  3  3  3     5000   10000
 4 -   0.0025W       -        -    4  4  4  4     8000   45000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        41 Celsius
Available Spare:                    100%
Available Spare Threshold:          1%
Percentage Used:                    1%
Data Units Read:                    1,159,357 [593 GB]
Data Units Written:                 2,893,336 [1.48 TB]
Host Read Commands:                 8,870,289
Host Write Commands:                42,819,647
Controller Busy Time:               949
Power Cycles:                       23
Power On Hours:                     305
Unsafe Shutdowns:                   7
Media and Data Integrity Errors:    0
Error Information Log Entries:      22
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               50 Celsius
Temperature Sensor 2:               51 Celsius
Temperature Sensor 3:               52 Celsius
Temperature Sensor 4:               53 Celsius
Temperature Sensor 5:               54 Celsius
Temperature Sensor 6:               55 Celsius
Temperature Sensor 7:               56 Celsius
Temperature Sensor 8:               57 Celsius

Error Information (NVMe Log 0x01, 16 of 64 entries)
No Errors Logged

zpool status:
root@truenas[~]# zpool status -v
  pool: Vault
 state: ONLINE
config:

        NAME                                            STATE     READ WRITE CKSUM
        Vault                                           ONLINE       0     0     0
          raidz2-0                                      ONLINE       0     0     0
            gptid/6077ac9d-bf9c-11ed-8490-001b21bc5267  ONLINE       0     0     0
            gptid/60783f53-bf9c-11ed-8490-001b21bc5267  ONLINE       0     0     0
            gptid/60796011-bf9c-11ed-8490-001b21bc5267  ONLINE       0     0     0
            gptid/6082ac00-bf9c-11ed-8490-001b21bc5267  ONLINE       0     0     0

errors: No known data errors

  pool: boot-pool
 state: ONLINE
config:

        NAME        STATE     READ WRITE CKSUM
        boot-pool   ONLINE       0     0     0
          ada0p2    ONLINE       0     0     0

errors: No known data errors

All drives show passed in the smart test info.

gstat -dp shows all four drives very busy, with all four seeing 100% and it bounces around between all 4.

Bad batch of nvme's possibly? The drives seem a bit warm, but all four have heat sinks right behind a fan, so that shouldn't be the issue.
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
Data Units Read: 1,159,357 [593 GB]
Data Units Written: 2,893,336 [1.48 TB]

This is worrying. There's some kind of write amplification going on, causing your drives to be stuck in garbage collection trying to create open blocks.

How are you actually serving to your containers and VMs? NFS or iSCSI? Your RAIDZ2 pool isn't the ideal topology for this use case.
 

dj423

Dabbler
Joined
Feb 25, 2023
Messages
29
This is worrying. There's some kind of write amplification going on, causing your drives to be stuck in garbage collection trying to create open blocks.

How are you actually serving to your containers and VMs? NFS or iSCSI? Your RAIDZ2 pool isn't the ideal topology for this use case.

Thanks for the reply,
They are via NFS share. I am sending an initial data transfer to the pool after setting initial settings. I am open to suggestions to better optimize the setup. Just shaking it down at this point, nothing is in production use.
 

dj423

Dabbler
Joined
Feb 25, 2023
Messages
29
All settings are the default for the pool. So LZ4 compression, standard sync, dedup is off, checksum On, exec on, and record size= 128K, acl mode is passthrough.
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
Your drives are TLC, with no DRAM cache, and only a small SLC write cache. These are optimized for low power laptop applications, not NAS.
What do you see with midclt call pool.query | jq?
 

dj423

Dabbler
Joined
Feb 25, 2023
Messages
29
Code:
[
  {
    "id": 1,
    "name": "Vault",
    "guid": "1407624868572024978",
    "encrypt": 0,
    "encryptkey": "",
    "path": "/mnt/Vault",
    "status": "ONLINE",
    "scan": {
      "function": null,
      "state": null,
      "start_time": null,
      "end_time": null,
      "percentage": null,
      "bytes_to_process": null,
      "bytes_processed": null,
      "bytes_issued": null,
      "pause": null,
      "errors": null,
      "total_secs_left": null
    },
    "topology": {
      "data": [
        {
          "type": "RAIDZ2",
          "path": null,
          "guid": "12961172195764917662",
          "status": "ONLINE",
          "stats": {
            "timestamp": 11326833955701,
            "read_errors": 0,
            "write_errors": 0,
            "checksum_errors": 0,
            "ops": [
              0,
              2455,
              1969761,
              0,
              0,
              0,
              0
            ],
            "bytes": [
              0,
              12144640,
              131804782592,
              0,
              0,
              0,
              0
            ],
            "size": 2027224563712,
            "allocated": 125442994176,
            "fragmentation": 0,
            "self_healed": 0,
            "configured_ashift": 12,
            "logical_ashift": 12,
            "physical_ashift": 0
          },
          "children": [
            {
              "type": "DISK",
              "path": "/dev/gptid/6077ac9d-bf9c-11ed-8490-001b21bc5267",
              "guid": "12880993750787959263",
              "status": "ONLINE",
              "stats": {
                "timestamp": 11326833990807,
                "read_errors": 0,
                "write_errors": 0,
                "checksum_errors": 0,
                "ops": [
                  0,
                  619,
                  505240,
                  0,
                  0,
                  0,
                  0
                ],
                "bytes": [
                  0,
                  3031040,
                  32917245952,
                  0,
                  0,
                  0,
                  0
                ],
                "size": 0,
                "allocated": 0,
                "fragmentation": 0,
                "self_healed": 0,
                "configured_ashift": 12,
                "logical_ashift": 9,
                "physical_ashift": 0
              },
              "children": [],
              "device": "nvd0p2",
              "disk": "nvd0",
              "unavail_disk": null
            },
            {
              "type": "DISK",
              "path": "/dev/gptid/60783f53-bf9c-11ed-8490-001b21bc5267",
              "guid": "9156169221111403492",
              "status": "ONLINE",
              "stats": {
                "timestamp": 11326834011737,
                "read_errors": 0,
                "write_errors": 0,
                "checksum_errors": 0,
                "ops": [
                  0,
                  609,
                  485668,
                  0,
                  0,
                  0,
                  0
                ],
                "bytes": [
                  0,
                  3006464,
                  32959844352,
                  0,
                  0,
                  0,
                  0
                ],
                "size": 0,
                "allocated": 0,
                "fragmentation": 0,
                "self_healed": 0,
                "configured_ashift": 12,
                "logical_ashift": 9,
                "physical_ashift": 0
              },
              "children": [],
              "device": "nvd1p2",
              "disk": "nvd1",
              "unavail_disk": null
            },
            {
              "type": "DISK",
              "path": "/dev/gptid/60796011-bf9c-11ed-8490-001b21bc5267",
              "guid": "17153764615388459500",
              "status": "ONLINE",
              "stats": {
                "timestamp": 11326834031925,
                "read_errors": 0,
                "write_errors": 0,
                "checksum_errors": 0,
                "ops": [
                  0,
                  628,
                  470862,
                  0,
                  0,
                  0,
                  0
                ],
                "bytes": [
                  0,
                  3076096,
                  33024880640,
                  0,
                  0,
                  0,
                  0
                ],
                "size": 0,
                "allocated": 0,
                "fragmentation": 0,
                "self_healed": 0,
                "configured_ashift": 12,
                "logical_ashift": 9,
                "physical_ashift": 0
              },
              "children": [],
              "device": "nvd2p2",
              "disk": "nvd2",
              "unavail_disk": null
            },
            {
              "type": "DISK",
              "path": "/dev/gptid/6082ac00-bf9c-11ed-8490-001b21bc5267",
              "guid": "14765001596030053651",
              "status": "ONLINE",
              "stats": {
                "timestamp": 11326834051813,
                "read_errors": 0,
                "write_errors": 0,
                "checksum_errors": 0,
                "ops": [
                  0,
                  599,
                  507991,
                  0,
                  0,
                  0,
                  0
                ],
                "bytes": [
                  0,
                  3031040,
                  32902811648,
                  0,
                  0,
                  0,
                  0
                ],
                "size": 0,
                "allocated": 0,
                "fragmentation": 0,
                "self_healed": 0,
                "configured_ashift": 12,
                "logical_ashift": 9,
                "physical_ashift": 0
              },
              "children": [],
              "device": "nvd3p2",
              "disk": "nvd3",
              "unavail_disk": null
            }
          ],
          "unavail_disk": null
        }
      ],
      "log": [],
      "cache": [],
      "spare": [],
      "special": [],
      "dedup": []
    },
    "healthy": true,
    "status_detail": null,
    "autotrim": {
      "value": "off",
      "rawvalue": "off",
      "parsed": "off",
      "source": "DEFAULT"
    },
    "encryptkey_path": null,
    "is_decrypted": true
  }
]


Would the WD Blue or red drives be a better choice for this application? Or Samsung 980's?
 
Last edited:

ChrisRJ

Wizard
Joined
Oct 23, 2020
Messages
1,919
Is the hardware I am using going to work for TrueNAS? Or should I toss it all aside and build something more robust? I have access to other hardware, but most of it is older enterprise surplus grade gear. I felt it would be better to start with all new hardware for a more solid system - and it was what I could find, with all the so called "shortages" of late.
In general used enterprise gear will give you a more solid system than new gamer gear. HDDs and PSUs deserve a special look, but motherboard, CPU, and RAM are typically good for at least 10 years of service.

As always, it depends on the details. So what used enterprise gear do you have available?
 

dj423

Dabbler
Joined
Feb 25, 2023
Messages
29
In general used enterprise gear will give you a more solid system than new gamer gear. HDDs and PSUs deserve a special look, but motherboard, CPU, and RAM are typically good for at least 10 years of service.

As always, it depends on the details. So what used enterprise gear do you have available?

Thanks Chris. It appears I went with the wrong drives for this application as Samuel hinted too. I am looking at swapping these out with WD Red drives. I will try turning sync off just to test (everything is on a UPS), but in the long run I think upgrading the drives is a must.
Oh, and I have a couple Lenovo RS140 low end servers, and some Lenovo Thinkstation workstations, all are 10 years or older. They make good test rigs for lab work, but not ideal for a solid storage setup.
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
I am looking at swapping these out with WD Red drives.

Beware that ordinary Reds are SMR, and won't work. Use Red Plus instead, which are CMR.
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
Sorry, I thought you were switching to Red spinners. The SN700 is a good choice for your application.
 

dj423

Dabbler
Joined
Feb 25, 2023
Messages
29
Sorry, I thought you were switching to Red spinners. The SN700 is a good choice for your application.
Thanks for confirming Samuel. Im hoping these will be better suited to a raid-z2 pool. Even with sync off, the drives I have won't cut it.
Lessons learned. Thanks for all the guidance, this has been a learning process for sure.
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
Im hoping these will be better suited to a raid-z2 pool

For your application, you may want to use spinners in a stripe pool of 2-way mirror VDEVs, and use smaller SN700s as special device VDEVs. This will give you the best compromise of capacity and performance.


 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
I felt it would be better to start with all new hardware for a more solid system

Then perhaps you should consider buying hardware with more than an estimated three to five year lifespan. The manufacturers of gaming boards typically expect that their target users are likely to upgrade every year or two, and because they are in a highly competitive marketplace, there is significant pressure to use lower cost, lower quality parts to keep the profit margin a bit higher. Server boards are typically expected to last seven to ten years, often hitting fifteen, and are built with high quality components, because the cost sensitivity is not there, and a board that costs ten percent more but lasts twice as long is going to be a winner in the data center business.
 

dj423

Dabbler
Joined
Feb 25, 2023
Messages
29
Then perhaps you should consider buying hardware with more than an estimated three to five year lifespan. The manufacturers of gaming boards typically expect that their target users are likely to upgrade every year or two, and because they are in a highly competitive marketplace, there is significant pressure to use lower cost, lower quality parts to keep the profit margin a bit higher. Server boards are typically expected to last seven to ten years, often hitting fifteen, and are built with high quality components, because the cost sensitivity is not there, and a board that costs ten percent more but lasts twice as long is going to be a winner in the data center business.

Makes sense. Where would the ASRock rack X570D4U fall? It takes ECC UDIMMs, and looks to be a server grade board.

Thanks

 

dj423

Dabbler
Joined
Feb 25, 2023
Messages
29
@jgreco

Are you still running the Silverstone SDP11 sata adapter? I am considering one of those with 4 WD red SA500 1TB drives, since my case is a little cramped. I may run a mirror of SN700's (have 4 on order) as well depending on what motherboard I end up using, and put everything in mirrors rather than raidz/2 since that seems to fit my use case better from what I am reading. This will let me ditch the nvme riser board which I never liked anyway. Thanks
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Makes sense. Where would the ASRock rack X570D4U fall? It takes ECC UDIMMs, and looks to be a server grade board.

Thanks

Other than the Avoton train wreck, I haven't heard much about these boards failing, so I'm going to guess that it doesn't happen often and that ASRock Rack did a good job with their stuff.

@jgreco

Are you still running the Silverstone SDP11 sata adapter? I am considering one of those with 4 WD red SA500 1TB drives, since my case is a little cramped. I may run a mirror of SN700's (have 4 on order) as well depending on what motherboard I end up using, and put everything in mirrors rather than raidz/2 since that seems to fit my use case better from what I am reading. This will let me ditch the nvme riser board which I never liked anyway. Thanks

The SDP11's are pretty nice. In many of our 12-bay/2U hypervisors, I use an LSI -8i RAID controller of some sort (currently the Dell PERC H740p) for ESXi datastores, which leaves the front bays completely available for virtualized TrueNAS or whatever.

If you look at the way that a Supermicro SC826 is typically laid out, there is space in the chassis behind the mid-chassis fan bulkhead in the PCIe expansion area. Who uses full length PCIe cards anyways? We had started doing this earlier with the Addonics modules --

https://extranet.www.sol.net/files/freenas/cool-hardware/upper-deck-storage.png

This works out very well because it naturally gets cooled, has access to power and data, etc. You can see my version of "a little cramped".

However, with the Silverstones, it turns out that if you try hard, you can actually stack TWO of these using 5/8" spacers between the two SDP11 boards and get eight SATA SSD's into your chassis. I ended up retrofitting this and created a drill template for field upgrades, which allowed me to replace the two Addonics SATA+NVMe cards with two Supermicro dual NVMe cards AND the stacked SDP11's, for a total of four NVMe M.2's and eight SATA M.2's.

Our current design gets rid of the dual X520 10G cards in favor of a AOC-STG-I4s quad 10G card, so that adds room for another dual NVMe card.

And that's how you pack ten pounds of crap into a five pound bag. :smile:
 

Etorix

Wizard
Joined
Dec 30, 2020
Messages
2,134
Where would the ASRock rack X570D4U fall? It takes ECC UDIMMs, and looks to be a server grade board.
It is designed as a server-grade board. This review of a complete AsRockRack server with B650D4U-2L2T motherboard has this to say about ECC:
The other major feature is memory support. One can use consumer non-ECC DDR5 UDIMMs (so long as they fit in the 1U server.) The bigger feature is that these can also use ECC DDR5 UDIMMs, as we have here. We have heard that people that have bit flippers to introduce memory errors are seeing those errors logged in this platform.
 

dj423

Dabbler
Joined
Feb 25, 2023
Messages
29
@jgreco That is a really cool setup. Yes, looks very tight in there!
Thanks for the confirmation on the board.

While I wait on the new drives, motherboard and ECC memory, I am still doing my 'homework' on ZFS. 45drives just did a great intro to zfs, and I have been going over the ixsystems "six metrics" white papers as well, and gaining a better understanding of zfs storage pool layout.

Another question-
As for boot drive (for TrueNAS) and the cache SLOG drive, - from what I am reading these don't need to be vary large in size. Can I just use a consumer grade (ie: small) SSD's for these, or do I need to look at using DRAM cached (SA500) disks? I do have 2 SP (no cache) SATA drives I was planning to use for the boot disk, (in fact, this is what it boots off of currently) and the other for the SLOG since these are smaller 256G drives. I didn't find any WD Red drives under 500G, and it seems like a waste to dedicate a 500G drive for SLOG duty. Not trying to cheap out, I just don't see many quality flash drives less than 500G on the market.
 
Top