TidalWave
Explorer
- Joined
- Mar 6, 2019
- Messages
- 51
Hey Guys,
We are in a bit of a pickle. We have TrueNAS 12.0u7 running for months just fine. However recently the server keeps crashing when we access data or run a scrub. We have a 90 drive JBOD supermicro chassis connected via SAS cable to a supermicro head unit server.
Three weeks agao, we had a drive fail but when I tried to replace the failed drive with a new drive, the web gui wouldn't accept the disk. And I tried to wipe the disk using the gui and it wouldn't wipe either. So I ran a command like this:
zpool replace tank2 /dev/gptid/00e45156-c66d-11eb-acb4-3cecef1011b7 /dev/multipath/disk56
That actually worked and the pool began resilvering. All was well, and I ran a scrub, all went well. People used the server for days and it was fine.
However a week later, the server started to randomly crash. We are not sure why, there are no errors on the JBOD controller. The freenas web gui would just stop working, and the middleware service crashed.
Sometimes we get an error message saying 30 drives popped offline, other times we get the python code error messages when the server crashes.
In either case I have to power cycle both the JBOD and the head unit. Upon reboot, the TrueNAS server will see the pool and do the import and then run some txg reallocations. After about 30 mins the server will boot and the pool will return.
However when people start accessing the data the server crashes again.
So I thought maybe when I added the drive with the command line that somehow corrupted the pool. So we offlined disk56 and reinserted a brand new HDD. These are all 16TB SAS by the way. So we wiped the disk and then reslivered the new disk using the web GUI and that worked. The resliver finished, and the pool shows healthy.
However when I run a scrub at about 5% the server crashes again. Giving the python error codes in the attached picture.
We are going to try swapping the JBOD chassis tomorrow, but I'm curious to know if anyone has any other ideas on why our server keeps crashing and giving the python error codes about middleware.
-Tidal
We are in a bit of a pickle. We have TrueNAS 12.0u7 running for months just fine. However recently the server keeps crashing when we access data or run a scrub. We have a 90 drive JBOD supermicro chassis connected via SAS cable to a supermicro head unit server.
Three weeks agao, we had a drive fail but when I tried to replace the failed drive with a new drive, the web gui wouldn't accept the disk. And I tried to wipe the disk using the gui and it wouldn't wipe either. So I ran a command like this:
zpool replace tank2 /dev/gptid/00e45156-c66d-11eb-acb4-3cecef1011b7 /dev/multipath/disk56
That actually worked and the pool began resilvering. All was well, and I ran a scrub, all went well. People used the server for days and it was fine.
However a week later, the server started to randomly crash. We are not sure why, there are no errors on the JBOD controller. The freenas web gui would just stop working, and the middleware service crashed.
Sometimes we get an error message saying 30 drives popped offline, other times we get the python code error messages when the server crashes.
In either case I have to power cycle both the JBOD and the head unit. Upon reboot, the TrueNAS server will see the pool and do the import and then run some txg reallocations. After about 30 mins the server will boot and the pool will return.
However when people start accessing the data the server crashes again.
So I thought maybe when I added the drive with the command line that somehow corrupted the pool. So we offlined disk56 and reinserted a brand new HDD. These are all 16TB SAS by the way. So we wiped the disk and then reslivered the new disk using the web GUI and that worked. The resliver finished, and the pool shows healthy.
However when I run a scrub at about 5% the server crashes again. Giving the python error codes in the attached picture.
We are going to try swapping the JBOD chassis tomorrow, but I'm curious to know if anyone has any other ideas on why our server keeps crashing and giving the python error codes about middleware.
-Tidal