Hi all thanks for your continued support. I think im getting somewhere with this but its complicated so bear with me....
SO, This is crystal disk output from a drive I thought had failed. This is the drive I was running with for quite some time before the recent bout of failures, and was the first I thought had failed:
I tried doing a full format in windows and copying some large files over with teracopy and verifying the copied data Looks OK to me:
Back on the problem server / drive I installed windows 10,
Ran updates, ran chkdisk, left it a while. Seems fine.
Unfortunately my bios is already most recent and I cannot see any explicit firmware update for the storage controller in question. The storage array firmware which is available appears to be for a raid add in card not the onboard chip.
I decided to pull the existing drive Here is it's smart data:
This led me to my current Rabbit hole.
When googling SATA CRC errors on seemingly none broken kit I came across posts talking about firmware incompatibilities between SSD drives and the controller firmware. ..... I saw suggestions of running the drive in IDE mode.
When I checked out my BIOS I saw that the SATA mode was already set to Legacy so changed it to AHCI.
Upon reboot I get a totally different firmware splash/readout which is encouraging. however weirdly in this mode All 4 data drives and the boot drive now appear under 1 single controller not 2 separate ones as before.
Then I remember why its sat that way. The HP Bios does not allow you to select which drive to boot from and will always just select the first drive on the selected controller. The standalone SATA port is actually just intended for an optical drive despite everyone using it for an SSD so they only expect a single boot pool.
So for the time being I have managed to get back online by booting the SSD attached to USB via a cheap caddy backplane and choosing USB boot. In theory as the server has an internal USB port I could live with this and get a slightly more robust adapter or purpose build USB drive.
I plan to stay online in this configuration for a day or 2 and resilver before any more major upsets.
However I'm looking at other options but its difficult to predict system behaviour.
I want to avoid HP raid cards. I don't want to use hardware Raid on my data drives.
Any other form of PCIE storage controller would likely be non bootable.
SO option 1:
I noticed that the data drives are actually connected via mini sas to 4 port sata breakout cable, I'm thinking if I stay in AHCI mode I could in theory by trial and error find the first drive on the controller and swap the data cable with that of the boot drive. My understanding is ZFS wont care the order the drives are connected and should adapt itself?
Option 2: Look in to a simple non bootable PCIE SATA card that truenas supports, connect all my data drives to that, and leave only the boot drives on the onboard controller, I could even mirror if that worked.
Option 3:
Other devices... As a slightly more outlandish but fun idea.. I have an older gen 7 server waiting in the wings I intend to use as a CCTV recorder, and I also have one of those AliExpress mini router platforms on its way.
It has a fair bit more CPU than the micro server. If there were fun things we could play with involving ISCSI, Network boot, clustering, anything like that.. I'm open to suggestions :)
If you read this far. Thank you for your interest and support, all opinions and advice greatly appreciated :)