Hello there,
yesterday one of my TrueNAS 12 (U5.1) seemed to have a strange hiccup. I acknoweleged an outage of it starting at some point in the morning, not thinking to much about it as the internet connection to it is not that reliable. But after it wasn't reachable for all day and my brother who is on site confirmed an issue with the system itself I then tried to debug. As long as he didn't perform a hard reset of the machine, no service hosted by the system, nor its webinterface was reachable at all. Afterwards (when it was up again) we got a warning about one of the two NVMe drives causing slow I/O with a timestamp close to when it got unreachable. I've checked the reports, which went on for a while after the utage was noticable, then stopped, then carried on the afternoon and stopped again in the evening a while before the hard reset. They showed that indeed that SSD had latencies as high as 8 seconds.
For reference, this is the latency graph of the other NVMe drive during the same timeperiod.
As the system continued to run fine after the hard reset I performed long smart tests on both NVMe drives which showed no signs of breakage, thus I suspect the controller beeing the curlprit. I'm unsure though how to behave now, besides (probably) buying a replacement upfront in case this was a "warning" about a soon coming drive failure.
NOTE: I'll update my signature with technical details about my systems. In case you've read this thread and there is no information visible yet, please just wait a minute before demanding the obvious. ;)
EDIT: System information is now available in my signature, this post refers to the "Small NAS".
yesterday one of my TrueNAS 12 (U5.1) seemed to have a strange hiccup. I acknoweleged an outage of it starting at some point in the morning, not thinking to much about it as the internet connection to it is not that reliable. But after it wasn't reachable for all day and my brother who is on site confirmed an issue with the system itself I then tried to debug. As long as he didn't perform a hard reset of the machine, no service hosted by the system, nor its webinterface was reachable at all. Afterwards (when it was up again) we got a warning about one of the two NVMe drives causing slow I/O with a timestamp close to when it got unreachable. I've checked the reports, which went on for a while after the utage was noticable, then stopped, then carried on the afternoon and stopped again in the evening a while before the hard reset. They showed that indeed that SSD had latencies as high as 8 seconds.
For reference, this is the latency graph of the other NVMe drive during the same timeperiod.
NOTE: I'll update my signature with technical details about my systems. In case you've read this thread and there is no information visible yet, please just wait a minute before demanding the obvious. ;)
EDIT: System information is now available in my signature, this post refers to the "Small NAS".
Last edited: