SATA DOM port failure? Supermicro x10-drl-i

jolness1

Dabbler
Joined
May 21, 2020
Messages
29
First off, Happy New Years (or almost New Year depending on when and where you read this)

This morning I rebooted my Macbook to install an update. When it loaded into the desktop I got an error about being unable to connect to my NFS share on my TrueNAS box. Tried to manually connect and no dice. So I decide I will load into the Web GUI and see what's going on, maybe reboot the system. Web GUI wasn't loading so I tried ssh, no luck there either. Then load into the IPMI and everything is working fine so I reboot, figuring its just some weird edge case bug that I have a knack for finding.
Upon reboot I get "failed command: WRITE FPDMA QUEUED" errors on ATA 10 (one of 2 SATA DOM ports on my board). Try rebooting again and the drive is not detected and the board tries to boot from LAN (which of course doesn't work). Reboot again, manually load into the boot menu and the drive is there now. Load in and after several cycles of this decide to boot using the debug mode. I get a "error: checksum verification failed" on loading initial ramdisk.
At this point I suspect the SATA DOM has failed, would be surprising as it's only a few years old but, again, I have a knack for having uncommon issues. No worries, I have a spare that I had been meaning to set up as a mirror (which I have a hunch wouldn't help here but still should have done it). Upon trying to install TrueNAS Scale to this new Sata DOM (in ATA10 or port I-SATA 5 as it's called in Supermicro's manual) and get a "COMRESET failed (errno=-16)" Which I had seen earlier as well. Now I begin to suspect the port is dead. No big deal, there is another port (I-SATA 4 or ATA9) for DOM so I try that. This yields the same issue.
Out of curiosity I try to install TrueNAS anyways on the new SATA DOM both on ATA9 and ATA10, with the same result (having unhooked all other drives in my system, data pool and an SSD I use for running VMs and applications just to be safe) and it appears the commands go through sometimes as it will eventually begin to install and then throw errors about failed commands again.
At this point I am thinking the mainboard has an issue, the supermicro documentation says these SATA ports run off of the chipset so I am thinking the failure may be there as I know both ports worked when I built this server 3.5yrs ago and it seems unlikely (albeit possible) that they both failed. Visually there are no issues I can see with the ports or anything else on the board.

Apologies for the long winded explanation, I have been at this for a long time and am trying to make sure I give as much information as possible. I would like to be able to conclusively (or somewhat) diagnose the issue before firing the parts cannon at the problem. A replacement board is going to cost more than what it did a few years ago but an entirely new system with new RAM, CPU and DRAM looks like it'll be substantially more and with the layoff (part of being in software right now it seems) I am extra conscious of fixing this issue as cheaply as possible but also want to actually fix it in a way that isn't "I'll just jerry rig this until it blows up again". Photos attached are in chronological order, some are somewhat redundant but this way if there is any further information I have that will help someone who knows more than I get me pointed in the right direction.

I have a the following questions:
1) Both SATA DOM modules I have come with external power cables (one uses USB2 header power, the other the port supermicro puts on the board specifically for the SATA DOM). Would I be able to try them in another port? It seems the answer is yes and since it's just power in that the DOM is looking for, there shouldn't be any risk but I don't want to totally toast the board.
2) Any tips for diagnosing the chipset issue more definitively?
3) Should I just get a boot SSD and run it off the HBA or if there is an issue with the chipset should I just move on from this motherboard before I risk it causing an issue with the data? I have it backed up on and offsite but if I can avoid some sort of issue, I would prefer that there are enough things that can go wrong even when observing best practices.

Thanks in advance, please let me know if I can clarify anything, apologies again for the ramble, combination of exhaustion and panic are not helping me at all at the moment.

System Specs (I realize some may not be relevant but just want to make sure I am giving any info that might help):
TrueNAS SCALE 23.10.1 (*edit* forgot to put this on)
Supermicro X10-DRL-i
Xeon 2660v3
64GB (4x16GB) ECC 2133mhz Samsung DDR3 (M393A2G40DB0)
LSI SAS3008 HBA card (data pool hooked up to this)
5 HGST He12 Drives
 

Attachments

  • Screenshot 2023-12-31 at 10.29.45 AM.png
    Screenshot 2023-12-31 at 10.29.45 AM.png
    312.5 KB · Views: 132
  • Screenshot 2023-12-31 at 11.34.18 AM.png
    Screenshot 2023-12-31 at 11.34.18 AM.png
    20.1 KB · Views: 134
  • Screenshot 2023-12-31 at 11.02.08 AM.png
    Screenshot 2023-12-31 at 11.02.08 AM.png
    223.4 KB · Views: 131
  • Screenshot 2023-12-31 at 11.01.00 AM.png
    Screenshot 2023-12-31 at 11.01.00 AM.png
    308.8 KB · Views: 133
  • Screenshot 2023-12-31 at 10.47.06 AM.png
    Screenshot 2023-12-31 at 10.47.06 AM.png
    240.2 KB · Views: 136
  • image.png
    image.png
    73.2 KB · Views: 150
Last edited:

Jailer

Not strong, but bad
Joined
Sep 12, 2014
Messages
4,977
I would try your SATA DOM on another port on your motherboard if you have some available. Since you have an external power cord you can install it in any SATA port and it should be fine. The SuperDOM ports are for providing power to a SATA DOM on most x10 series and higher boards. Since you have a power cord you can install your DOM in any SATA port you choose provided it has external power connected to the DOM.
 

jolness1

Dabbler
Joined
May 21, 2020
Messages
29
I would try your SATA DOM on another port on your motherboard if you have some available. Since you have an external power cord you can install it in any SATA port and it should be fine. The SuperDOM ports are for providing power to a SATA DOM on most x10 series and higher boards. Since you have a power cord you can install your DOM in any SATA port you choose provided it has external power connected to the DOM.
I am gonna give that a go today. I was exhausted yesterday and that's what I grokked from reading but was in that space mentally where I was concerned I would screw something up trying to force it to work. Appreciate the confirmation. I feel less worried about borking something now.
 

jolness1

Dabbler
Joined
May 21, 2020
Messages
29
Update, have tried both SATA DOMs on multiple ports and all of them do the same thing. I was thinking maybe a trace was broken but the "I SATA" off the PCH and the "S SATA" off the SCU both have the same issue and I would guess they do not share traces.
Not sure if I should grab an NVMe to PCIe adaptor (I think this can boot from PCIe storage) and run the Samsung 960 SSD I have or if I risk creating further issues somehow. If the PCH has some sort of fault I worry that I could have a more severe issue and somehow corrupt data. I wouldn't think so but I always wouldn't expect a failure like this
 
Last edited:

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
There is a good chance that one of them is still alive: leave connected only one of the sata doms and try reinstalling; if it doesn't work, do the same with the other.
 

jolness1

Dabbler
Joined
May 21, 2020
Messages
29
There is a good chance that one of them is still alive: leave connected only one of the sata doms and try reinstalling; if it doesn't work, do the same with the other.
I am in the process of trying 1 DOM at a time in each of the 10 ports. Have 4 more to go for each and sadly not looking good. Gonna try a SATA SSD next but I am not optimistic. Only one of the DOMs was in use (have been meaning to set up a mirror but just one of those things that I haven't gotten time to do yet) but at this point I am looking for any way to keep this board running for a bit.
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
Do you have a copy of the config?
If so - try plugging a normal SSD into a SATA port and installing TN on that. Might help determine if its a SATA issue or a DOM issue

Hell on a temp basis you could use a USB thumb drive to get things running - just not on a long term basis
 

jolness1

Dabbler
Joined
May 21, 2020
Messages
29
Do you have a copy of the config?
If so - try plugging a normal SSD into a SATA port and installing TN on that. Might help determine if its a SATA issue or a DOM issue

Hell on a temp basis you could use a USB thumb drive to get things running - just not on a long term basis
I did give a normal SSD a go as well. Same issue unfortauntely. It seems like the issue is intermittent which is strange. I tried all 10 ports with 3 different drives and all of them started to have some sort of error like this. The first one was my installed SATA DOM, that would hang on boot. The other 2, I tried installing and it would typically start to write and then throw an error. Sometimes the boot drive wouldn't even be picked up and my server would go straight to PXE booting. I did grab the config using another machine so I think I may go the USB route for the short term. I posted on reddit asking for feedback (I am just confused what could be causing this and concerned that the issue could be something that might wreak havoc elsewhere.) and someone nicely reached out and said they have a spare board like this that they bought years ago that they would sell me for about half of what I could find them for anywhere else. I hate to junk something and I really hate not knowing what is the cause of the issue but I am out of my depth here.
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
If you were using a single DOM it's very likely it just died. Reinstall and try it, if it doesn't work you will have to find another drive.
I would try installing on a USB as well.
 

jolness1

Dabbler
Joined
May 21, 2020
Messages
29
If you were using a single DOM it's very likely it just died. Reinstall and try it, if it doesn't work you will have to find another drive.
I would try installing on a USB as well.
I tried it in another machine (it can take power via a USB 2 header) and it booted fine. Not saying for sure that all is well but after trying 3 different drives (2 DOMs, 1 standard SSD) in all 10 SATA ports, it seems the failure is on the board for sure (Drive might have issues too but for sure the board does). Tried a USB install this morning, works fine. Data is still there too. So I am not sure what the hell happened. For it to spontaneously fail like that and all ports stop working is bizarre. And it's intermittent. Sometimes the drive won't show up at all. Sometimes it's fine for a few minutes of writing to the drive during install before erroring out. Definitely a weird issue that I wish I could diagnose further but this is beyond my equipment and skills.
 
Top