Hello!
I am still building my first NAS (which is actually not for me):
I now wanted to do another solnet test, but during the test, one of the HDDs suddenly detached!
I placed the 4 HDDs into the slots "I0" to "I3" ("I" = internal / Mainboard, "E" = extended / HBA), launched mfsbsd and started the test (for the HDDs only, I ignored the SSD).
This is the log of solnet (I enhanced it with a time-printing wrapper):
As you can see, ada3 falls out:
The kernel says this:
What does that mean, and what would be the most reasonable steps to do first and what should I suspect first? The drive? The cable (maybe I rolled it too tightly? See picture)? The mainboard? Myself?
I am still building my first NAS (which is actually not for me):
- Mainboard: Supermicro X10SDV-4C+-TLN4F bulk (MBD-X10SDV-4C+-TLN4F-B)
- RAM: Crucial DIMM 16GB, DDR4-2666, CL19, ECC (CT16G4WFD8266)
- PSU: Corsair SF750 Platinum (SFX/80+)
- Case: SilverStone Case Storage DS380, Mini-ITX (SST-DS380B)
- HBA: InLine 76617E, 4x SATA, PCIe 2.0 x1 (the Mainboard has only 6 SATA ports)
- SATA-Cables: Supermicro Spare 2FT Amphenol SATA CB
- HDDs: WD Red 4TB WD40EFRX
- SSD: Intenso High Performance interne SSD 120GB (6,3 cm (2,5 Zoll), SATA III, 520 MB/Sekunden) schwarz – 120 GB – Sata 2,5
- Assembling everything (without SSD)
- Smoke Test
- Update IPMI
- Burn-In-Tests of CPU (OCCT and stress and firestarter and prime95... Wanted to do everything I could, right? ^^)
- Two weeks of memtest86
- SMART-Tests of the WD-HDDs (fine)
- Badblocks-Test of the WD-HDDs (fine)
- First Solnet-Test of the WD-HDDs (fine, but the HDD-Activity LED breaks)
- Disassemble everything, after lots of swearing, simply replace the HDD-Activity LED with the one from my test kit
- Assemble everything again, this time with SSD and improved cable management
On the image, you can (not) see the SSD in the right cage. - At the first smoke test afterwards, the SSD threw CRC errors into my mfsbsd kernel log -- cross testing showed that the cable was bad.
- Replaced the cable.
I now wanted to do another solnet test, but during the test, one of the HDDs suddenly detached!
I placed the 4 HDDs into the slots "I0" to "I3" ("I" = internal / Mainboard, "E" = extended / HBA), launched mfsbsd and started the test (for the HDDs only, I ignored the SSD).
This is the log of solnet (I enhanced it with a time-printing wrapper):
Code:
Tue Aug 18 19:29:20 CEST 2020 Starte Solnet... sol.net disk array test v2 1) Use all disks (from camcontrol) 2) Use selected disks (from camcontrol|grep) 3) Specify disks 4) Show camcontrol list Option: <JAJS600M128C S0222A0> at scbus3 target 0 lun 0 (ada0,pass0) // That's the SSD on the HBA <WDC WD40EFRX-68N32N0 82.00A82> at scbus4 target 0 lun 0 (ada1,pass1) // HDD in I0 <WDC WD40EFRX-68N32N0 82.00A82> at scbus5 target 0 lun 0 (ada2,pass2) // HDD in I1 <WDC WD40EFRX-68N32N0 82.00A82> at scbus6 target 0 lun 0 (ada3,pass3) // HDD in I2 <WDC WD40EFRX-68N32N0 82.00A82> at scbus7 target 0 lun 0 (ada4,pass4) // HDD in I3 <HL-DT-ST DVDRAM GSA-E10N JE05> at scbus10 target 0 lun 0 (pass5,cd0) // The CD drive I booted mfsbsd from <Intenso Rainbow Line 6.90> at scbus11 target 0 lun 0 (pass6,da0) // The USB stick for storing the logs Press Return: 1) Use all disks (from camcontrol) 2) Use selected disks (from camcontrol|grep) 3) Specify disks 4) Show camcontrol list Option: Enter disk devices separated by spaces (e.g. da1 da2): Selected disks: ada1 ada2 ada3 ada4 <WDC WD40EFRX-68N32N0 82.00A82> at scbus4 target 0 lun 0 (ada1,pass1) <WDC WD40EFRX-68N32N0 82.00A82> at scbus5 target 0 lun 0 (ada2,pass2) <WDC WD40EFRX-68N32N0 82.00A82> at scbus6 target 0 lun 0 (ada3,pass3) <WDC WD40EFRX-68N32N0 82.00A82> at scbus7 target 0 lun 0 (ada4,pass4) Is this correct? (y/N): Performing initial serial array read (baseline speeds) Tue Aug 18 19:29:55 CEST 2020 Tue Aug 18 19:38:56 CEST 2020 Completed: initial serial array read (baseline speeds) Array's average speed is 175.505 MB/sec per disk Disk Disk Size MB/sec %ofAvg ------- ---------- ------ ------ ada1 3815447MB 173 98 ada2 3815447MB 169 96 ada3 3815447MB 182 104 ada4 3815447MB 179 102 Performing initial parallel array read Tue Aug 18 19:38:56 CEST 2020 The disk ada1 appears to be 3815447 MB. Disk is reading at about 174 MB/sec This suggests that this pass may take around 366 minutes Serial Parall % of Disk Disk Size MB/sec MB/sec Serial ------- ---------- ------ ------ ------ ada1 3815447MB 173 173 101 ada2 3815447MB 169 170 101 ada3 3815447MB 182 182 100 ada4 3815447MB 179 179 100 Awaiting completion: initial parallel array read Wed Aug 19 03:34:22 CEST 2020 Completed: initial parallel array read Disk's average time is 28015 seconds per disk Disk Bytes Transferred Seconds %ofAvg ------- ----------------- ------- ------ ada1 4000787030016 28471 102 ada2 4000787030016 28525 102 ada3 4000787030016 27402 98 ada4 4000787030016 27660 99 Performing initial parallel seek-stress array read Wed Aug 19 03:34:22 CEST 2020 The disk ada1 appears to be 3815447 MB. Disk is reading at about 174 MB/sec This suggests that this pass may take around 365 minutes Serial Parall % of Disk Disk Size MB/sec MB/sec Serial ------- ---------- ------ ------ ------ ada1 3815447MB 173 170 98 ada2 3815447MB 169 164 97 ada3 3815447MB 182 181 99 ada4 3815447MB 179 175 98 Awaiting completion: initial parallel seek-stress array read Fri Aug 21 04:00:07 CEST 2020 Completed: initial parallel seek-stress array read !!ERROR!! dd: /dev/ada3: Device not configured Disk's average time is 90762 seconds per disk Disk Bytes Transferred Seconds %ofAvg ------- ----------------- ------- ------ ada1 4000787030016 72090 79 ++FAST++ ada2 4000787030016 82851 91 ++FAST++ ada3 2884906254336 54532 60 ++FAST++ ada4 4000787030016 153575 169 --SLOW-- Fri Aug 21 04:00:07 CEST 2020 Solnet ended
As you can see, ada3 falls out:
!!ERROR!! dd: /dev/ada3: Device not configured
.The kernel says this:
Code:
Aug 19 16:43:14 mfsbsd kernel: ada3 at ahcich6 bus 0 scbus6 target 0 lun 0 Aug 19 16:43:14 mfsbsd kernel: ada3: <WDC WD40EFRX-68N32N0 82.00A82> s/n WD-WCC7K0ER3TKT detached Aug 19 16:43:14 mfsbsd kernel: (ada3:ahcich6:0:0:0): Periph destroyed Aug 19 16:43:17 mfsbsd kernel: (aprobe0:ahcich6:0:0:0): NOP FLUSHQUEUE. ACB: 00 00 00 00 00 00 00 00 00 00 00 00 Aug 19 16:43:17 mfsbsd kernel: (aprobe0:ahcich6:0:0:0): CAM status: ATA Status Error Aug 19 16:43:17 mfsbsd kernel: (aprobe0:ahcich6:0:0:0): ATA status: d1 (BSY DRDY SERV ERR), error: 04 (ABRT ) Aug 19 16:43:17 mfsbsd kernel: (aprobe0:ahcich6:0:0:0): RES: d1 04 ff ff ff ff ff ff ff ff ff Aug 19 16:43:17 mfsbsd kernel: (aprobe0:ahcich6:0:0:0): Error 5, Retries exhausted
What does that mean, and what would be the most reasonable steps to do first and what should I suspect first? The drive? The cable (maybe I rolled it too tightly? See picture)? The mainboard? Myself?