Hello!
I am still building my first NAS (which is actually not for me):
I now wanted to do another solnet test, but during the test, one of the HDDs suddenly detached!
I placed the 4 HDDs into the slots "I0" to "I3" ("I" = internal / Mainboard, "E" = extended / HBA), launched mfsbsd and started the test (for the HDDs only, I ignored the SSD).
This is the log of solnet (I enhanced it with a time-printing wrapper):
As you can see, ada3 falls out:
The kernel says this:
What does that mean, and what would be the most reasonable steps to do first and what should I suspect first? The drive? The cable (maybe I rolled it too tightly? See picture)? The mainboard? Myself?
I am still building my first NAS (which is actually not for me):
- Mainboard: Supermicro X10SDV-4C+-TLN4F bulk (MBD-X10SDV-4C+-TLN4F-B)
- RAM: Crucial DIMM 16GB, DDR4-2666, CL19, ECC (CT16G4WFD8266)
- PSU: Corsair SF750 Platinum (SFX/80+)
- Case: SilverStone Case Storage DS380, Mini-ITX (SST-DS380B)
- HBA: InLine 76617E, 4x SATA, PCIe 2.0 x1 (the Mainboard has only 6 SATA ports)
- SATA-Cables: Supermicro Spare 2FT Amphenol SATA CB
- HDDs: WD Red 4TB WD40EFRX
- SSD: Intenso High Performance interne SSD 120GB (6,3 cm (2,5 Zoll), SATA III, 520 MB/Sekunden) schwarz – 120 GB – Sata 2,5
- Assembling everything (without SSD)
- Smoke Test
- Update IPMI
- Burn-In-Tests of CPU (OCCT and stress and firestarter and prime95... Wanted to do everything I could, right? ^^)
- Two weeks of memtest86
- SMART-Tests of the WD-HDDs (fine)
- Badblocks-Test of the WD-HDDs (fine)
- First Solnet-Test of the WD-HDDs (fine, but the HDD-Activity LED breaks)
- Disassemble everything, after lots of swearing, simply replace the HDD-Activity LED with the one from my test kit
- Assemble everything again, this time with SSD and improved cable management
On the image, you can (not) see the SSD in the right cage. - At the first smoke test afterwards, the SSD threw CRC errors into my mfsbsd kernel log -- cross testing showed that the cable was bad.
- Replaced the cable.
I now wanted to do another solnet test, but during the test, one of the HDDs suddenly detached!
I placed the 4 HDDs into the slots "I0" to "I3" ("I" = internal / Mainboard, "E" = extended / HBA), launched mfsbsd and started the test (for the HDDs only, I ignored the SSD).
This is the log of solnet (I enhanced it with a time-printing wrapper):
Code:
Tue Aug 18 19:29:20 CEST 2020 Starte Solnet...
sol.net disk array test v2
1) Use all disks (from camcontrol)
2) Use selected disks (from camcontrol|grep)
3) Specify disks
4) Show camcontrol list
Option: <JAJS600M128C S0222A0> at scbus3 target 0 lun 0 (ada0,pass0) // That's the SSD on the HBA
<WDC WD40EFRX-68N32N0 82.00A82> at scbus4 target 0 lun 0 (ada1,pass1) // HDD in I0
<WDC WD40EFRX-68N32N0 82.00A82> at scbus5 target 0 lun 0 (ada2,pass2) // HDD in I1
<WDC WD40EFRX-68N32N0 82.00A82> at scbus6 target 0 lun 0 (ada3,pass3) // HDD in I2
<WDC WD40EFRX-68N32N0 82.00A82> at scbus7 target 0 lun 0 (ada4,pass4) // HDD in I3
<HL-DT-ST DVDRAM GSA-E10N JE05> at scbus10 target 0 lun 0 (pass5,cd0) // The CD drive I booted mfsbsd from
<Intenso Rainbow Line 6.90> at scbus11 target 0 lun 0 (pass6,da0) // The USB stick for storing the logs
Press Return:
1) Use all disks (from camcontrol)
2) Use selected disks (from camcontrol|grep)
3) Specify disks
4) Show camcontrol list
Option:
Enter disk devices separated by spaces (e.g. da1 da2):
Selected disks: ada1 ada2 ada3 ada4
<WDC WD40EFRX-68N32N0 82.00A82> at scbus4 target 0 lun 0 (ada1,pass1)
<WDC WD40EFRX-68N32N0 82.00A82> at scbus5 target 0 lun 0 (ada2,pass2)
<WDC WD40EFRX-68N32N0 82.00A82> at scbus6 target 0 lun 0 (ada3,pass3)
<WDC WD40EFRX-68N32N0 82.00A82> at scbus7 target 0 lun 0 (ada4,pass4)
Is this correct? (y/N): Performing initial serial array read (baseline speeds)
Tue Aug 18 19:29:55 CEST 2020
Tue Aug 18 19:38:56 CEST 2020
Completed: initial serial array read (baseline speeds)
Array's average speed is 175.505 MB/sec per disk
Disk Disk Size MB/sec %ofAvg
------- ---------- ------ ------
ada1 3815447MB 173 98
ada2 3815447MB 169 96
ada3 3815447MB 182 104
ada4 3815447MB 179 102
Performing initial parallel array read
Tue Aug 18 19:38:56 CEST 2020
The disk ada1 appears to be 3815447 MB.
Disk is reading at about 174 MB/sec
This suggests that this pass may take around 366 minutes
Serial Parall % of
Disk Disk Size MB/sec MB/sec Serial
------- ---------- ------ ------ ------
ada1 3815447MB 173 173 101
ada2 3815447MB 169 170 101
ada3 3815447MB 182 182 100
ada4 3815447MB 179 179 100
Awaiting completion: initial parallel array read
Wed Aug 19 03:34:22 CEST 2020
Completed: initial parallel array read
Disk's average time is 28015 seconds per disk
Disk Bytes Transferred Seconds %ofAvg
------- ----------------- ------- ------
ada1 4000787030016 28471 102
ada2 4000787030016 28525 102
ada3 4000787030016 27402 98
ada4 4000787030016 27660 99
Performing initial parallel seek-stress array read
Wed Aug 19 03:34:22 CEST 2020
The disk ada1 appears to be 3815447 MB.
Disk is reading at about 174 MB/sec
This suggests that this pass may take around 365 minutes
Serial Parall % of
Disk Disk Size MB/sec MB/sec Serial
------- ---------- ------ ------ ------
ada1 3815447MB 173 170 98
ada2 3815447MB 169 164 97
ada3 3815447MB 182 181 99
ada4 3815447MB 179 175 98
Awaiting completion: initial parallel seek-stress array read
Fri Aug 21 04:00:07 CEST 2020
Completed: initial parallel seek-stress array read
!!ERROR!! dd: /dev/ada3: Device not configured
Disk's average time is 90762 seconds per disk
Disk Bytes Transferred Seconds %ofAvg
------- ----------------- ------- ------
ada1 4000787030016 72090 79 ++FAST++
ada2 4000787030016 82851 91 ++FAST++
ada3 2884906254336 54532 60 ++FAST++
ada4 4000787030016 153575 169 --SLOW--
Fri Aug 21 04:00:07 CEST 2020 Solnet endedAs you can see, ada3 falls out:
!!ERROR!! dd: /dev/ada3: Device not configured.The kernel says this:
Code:
Aug 19 16:43:14 mfsbsd kernel: ada3 at ahcich6 bus 0 scbus6 target 0 lun 0 Aug 19 16:43:14 mfsbsd kernel: ada3: <WDC WD40EFRX-68N32N0 82.00A82> s/n WD-WCC7K0ER3TKT detached Aug 19 16:43:14 mfsbsd kernel: (ada3:ahcich6:0:0:0): Periph destroyed Aug 19 16:43:17 mfsbsd kernel: (aprobe0:ahcich6:0:0:0): NOP FLUSHQUEUE. ACB: 00 00 00 00 00 00 00 00 00 00 00 00 Aug 19 16:43:17 mfsbsd kernel: (aprobe0:ahcich6:0:0:0): CAM status: ATA Status Error Aug 19 16:43:17 mfsbsd kernel: (aprobe0:ahcich6:0:0:0): ATA status: d1 (BSY DRDY SERV ERR), error: 04 (ABRT ) Aug 19 16:43:17 mfsbsd kernel: (aprobe0:ahcich6:0:0:0): RES: d1 04 ff ff ff ff ff ff ff ff ff Aug 19 16:43:17 mfsbsd kernel: (aprobe0:ahcich6:0:0:0): Error 5, Retries exhausted
What does that mean, and what would be the most reasonable steps to do first and what should I suspect first? The drive? The cable (maybe I rolled it too tightly? See picture)? The mainboard? Myself?