Hi all,
I'm building my second TrueNAS system and trying to use server-grade hardware this time around. The machine I built is inspired by this [other post](https://www.truenas.com/community/threads/will-this-ryzen-build-freenas-and-should-i-go-with-scale-over-core.99493/) because I want I want a relatively compact NAS.
Specs:
- CPU: Ryzen 9 5900X
- Motherboard: ASRock Rack X570D4U
- RAM: 2x16GB Kingston KSM32ES8/16MF
- Case: Fractal Node 804
- PSU: EVGA G6 850W
- OS: currently TrueNAS-SCALE-22.12-BETA.2 (originally was on the "stable" version)
- Drives: 5x 4TB WD Reds (4x WD40EFZX, 1x WD40EFRX) all currently plugged into SATA ports on motherboard
For the drives, I had been buying them over a period of several months as they had gone on sale. I have five new drives (plan is to have 6 in the array, but the other two are in a mirror in the existing NAS). I have everything built, ran memtest for a day or two, ran all the SMART tests, and I've been trying to burn in the drives using badblocks as mentioned in the [burn-in test guide](https://www.truenas.com/community/resources/hard-drive-burn-in-testing.92/).
The problem I am running into is that when I run badblocks on all five drives at once, only one (and it's always the same drive) of them will properly write the entire drive. The rest will run for some amount of time then hang.
Smartctl info for one of the bad drives (after running badblocks):
I notice that the SATA speed drops to 1.5Gb/s. Disk usage drops to near zero for those drives, and the regular smart checks that TrueNAS does reports errors because it's unable to read the smart attributes. On restart, everything is back to normal.
Troubleshooting steps I've taken:
* Reseated all SATA cables on both sides
* Swapped the good SATA connection on the mobo with a bad one with no effect, good drive still works bad drive still does not so it's not a bad port
* Tried Ubuntu, and TrueNAS Core, and now also updated to the latest beta for SCALE for latest kernel
* Booted to Windows to run WD software (thought I would be able to update the firmware)
* Contacted WD support, they have mentioned to talk to the vendor
I'm not sure if it's a red herring, but the one drive that works is the WD40EFRX which is WD Red before they rebranded to the "Plus" branding. It has a newer firmware than the newer Plus drives. WD says they do not provide firmware updates for drives. In any case, I'm not really sure what else to try at this point. It seems pretty unlikely that all four of those drives are bad, but I don't know what other steps I should take to understand and resolve the problem.
Please let me know if there is other information that I should post for more context, this is my first time post though I've been lurking for a while on/off.
Thanks!
I'm building my second TrueNAS system and trying to use server-grade hardware this time around. The machine I built is inspired by this [other post](https://www.truenas.com/community/threads/will-this-ryzen-build-freenas-and-should-i-go-with-scale-over-core.99493/) because I want I want a relatively compact NAS.
Specs:
- CPU: Ryzen 9 5900X
- Motherboard: ASRock Rack X570D4U
- RAM: 2x16GB Kingston KSM32ES8/16MF
- Case: Fractal Node 804
- PSU: EVGA G6 850W
- OS: currently TrueNAS-SCALE-22.12-BETA.2 (originally was on the "stable" version)
- Drives: 5x 4TB WD Reds (4x WD40EFZX, 1x WD40EFRX) all currently plugged into SATA ports on motherboard
For the drives, I had been buying them over a period of several months as they had gone on sale. I have five new drives (plan is to have 6 in the array, but the other two are in a mirror in the existing NAS). I have everything built, ran memtest for a day or two, ran all the SMART tests, and I've been trying to burn in the drives using badblocks as mentioned in the [burn-in test guide](https://www.truenas.com/community/resources/hard-drive-burn-in-testing.92/).
The problem I am running into is that when I run badblocks on all five drives at once, only one (and it's always the same drive) of them will properly write the entire drive. The rest will run for some amount of time then hang.
Code:
dmesg --level=emerg,alert,crit,err [375916.109136] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0xd0000 action 0x6 frozen [375916.116964] ata2: SError: { PHYRdyChg CommWake 10B8B } [375916.122522] ata2.00: failed command: WRITE DMA EXT [375916.127739] ata2.00: cmd 35/00:00:00:26:e1/00:02:16:00:00/e0 tag 12 dma 262144 out res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Smartctl info for one of the bad drives (after running badblocks):
Code:
=== START OF INFORMATION SECTION === Model Family: Western Digital Red Device Model: WDC WD40EFZX-68AWUN0 Serial Number: WD-WX52DB1E1REX LU WWN Device Id: 5 0014ee 26a3b56e3 Firmware Version: 81.00B81 User Capacity: 4,000,787,030,016 bytes [4.00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Rotation Rate: 5400 rpm Form Factor: 3.5 inches Device is: In smartctl database [for details use: -P show] ATA Version is: ACS-3 T13/2161-D revision 5 SATA Version is: SATA 3.1, 6.0 Gb/s (current: 1.5 Gb/s) Local Time is: Wed Oct 5 22:55:22 2022 EDT SMART support is: Available - device has SMART capability. SMART support is: Enabled
I notice that the SATA speed drops to 1.5Gb/s. Disk usage drops to near zero for those drives, and the regular smart checks that TrueNAS does reports errors because it's unable to read the smart attributes. On restart, everything is back to normal.
Troubleshooting steps I've taken:
* Reseated all SATA cables on both sides
* Swapped the good SATA connection on the mobo with a bad one with no effect, good drive still works bad drive still does not so it's not a bad port
* Tried Ubuntu, and TrueNAS Core, and now also updated to the latest beta for SCALE for latest kernel
* Booted to Windows to run WD software (thought I would be able to update the firmware)
* Contacted WD support, they have mentioned to talk to the vendor
I'm not sure if it's a red herring, but the one drive that works is the WD40EFRX which is WD Red before they rebranded to the "Plus" branding. It has a newer firmware than the newer Plus drives. WD says they do not provide firmware updates for drives. In any case, I'm not really sure what else to try at this point. It seems pretty unlikely that all four of those drives are bad, but I don't know what other steps I should take to understand and resolve the problem.
Please let me know if there is other information that I should post for more context, this is my first time post though I've been lurking for a while on/off.
Thanks!