bferrell
Dabbler
- Joined
- Dec 10, 2018
- Messages
- 15
I have several servers, 4 of which are FreeNAS boxes. 3 of those perform great, but I have a new R720XD LFF that is causing me nothing but grief. I'm not a Linux or FreeNAS expert by any means, but I have some experience.
Here's the backstory. A few weeks ago I bought this new box, installed an HBA (LSI 9211-8i P20 IT Mode) and a couple of rear mount drives, installed FreeNAS, and added 12 10TB IronWolf Pro drives from a QNAP NAS that I was decommissioning. Those drives are a few years old, but had been performing great in the NAS. This box is being used as a TimeMachine store for my network.
Once I started using the system it immediately started throwing drive faults (Pool xxxx state is UNAVAIL: One or more devices are faulted in response to persistent errors.). On the Pool status page it would show some number of errors, so I would take the drive offline, replace it, it would resilver, and the pool would come back online. Then several hours later it would happen again. After the 5th or 6th drive I started to get suspicous the drives weren't all failing simutaneously and ordered a new HBA (same type), new cables, and a replacement backplane.
Over a few days I swapped all 3 out, and the new backplane seemed to help for a couple of days, but then the errors started to recurr. At this point I decided that perhaps the drives were actually just all reaching end of life and started to replace them as such. Until today.
Now the box is telling me that one of my brand new drives has faults. I understand that this is not impossible, but after replacing 8 of the old drives I really don't believe the faults, but I'm not sure what else to check. Attatched dmesg files, but the error doesn't tell me much.
Here's the backstory. A few weeks ago I bought this new box, installed an HBA (LSI 9211-8i P20 IT Mode) and a couple of rear mount drives, installed FreeNAS, and added 12 10TB IronWolf Pro drives from a QNAP NAS that I was decommissioning. Those drives are a few years old, but had been performing great in the NAS. This box is being used as a TimeMachine store for my network.
Once I started using the system it immediately started throwing drive faults (Pool xxxx state is UNAVAIL: One or more devices are faulted in response to persistent errors.). On the Pool status page it would show some number of errors, so I would take the drive offline, replace it, it would resilver, and the pool would come back online. Then several hours later it would happen again. After the 5th or 6th drive I started to get suspicous the drives weren't all failing simutaneously and ordered a new HBA (same type), new cables, and a replacement backplane.
Over a few days I swapped all 3 out, and the new backplane seemed to help for a couple of days, but then the errors started to recurr. At this point I decided that perhaps the drives were actually just all reaching end of life and started to replace them as such. Until today.
Now the box is telling me that one of my brand new drives has faults. I understand that this is not impossible, but after replacing 8 of the old drives I really don't believe the faults, but I'm not sure what else to check. Attatched dmesg files, but the error doesn't tell me much.
Code:
CPU: Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz (2900.07-MHz K8-class CPU) Origin="GenuineIntel" Id=0x206d7 Family=0x6 Model=0x2d Stepping=7 Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE> Features2=0x1fbee3ff<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX> AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM> AMD Features2=0x1<LAHF> Structured Extended Features3=0x9c000400<MD_CLEAR,IBPB,STIBP,L1DFL,SSBD> XSAVE Features=0x1<XSAVEOPT> VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID TSC: P-state invariant, performance statistics nfsd: can't register svc name (pass4:mps0:0:19:0): LOG SENSE. CDB: 4d 00 0d 00 00 00 00 00 40 00 length 64 SMID 743 Aborting command 0xfffffe000219df30 mps0: Sending reset from mpssas_send_abort for target ID 19 (da3:mps0:0:19:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 length 0 SMID 90 terminated ioc 804b loginfo 31140000 scsi 0 state c xfer 0 mps0: Unfreezing devq for target ID 19 (da3:mps0:0:19:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 (da3:mps0:0:19:0): CAM status: CCB request completed with an error (da3:mps0:0:19:0): Retrying command (da6:mps0:0:24:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 length 0 SMID 1054 Aborting command 0xfffffe00021b7760 mps0: Sending reset from mpssas_send_abort for target ID 24 mps0: Unfreezing devq for target ID 24 (da6:mps0:0:24:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 (da6:mps0:0:24:0): CAM status: Command timeout (da6:mps0:0:24:0): Retrying command (da6:mps0:0:24:0): SYNCHRONIZE CACHE(10)