TN Core locked up hard

jcizzo

Explorer
Joined
Jan 20, 2023
Messages
79
Hey all, odd ball thing happened. I just installed TNC 13.xx (latest stable) on my nas.. fresh rebuild.
hardware consists of the following:

i3-7100T
supermicro x11ssh-f motherboard (latest firmware, both bios and bmc)
32GB ECC udimm (supermicro compatible)
lsi 9211-8i hba
nics are intel (i210 and x710da2).
boot drives and Data drives samsung (2 of each, mirrored of course). the datadrives are 870 evo's.

i was migrating some of my data over to the data drives via the 10Gb lan and after about 8Gigs had transferred all stopped.. only way out of it was to ipmi into it and do a power reset. when i logged back in i received an alert telling me that one of my 870 evo's had a hard error that couldn't be recovered and TNC kicked it out of the pool...

ok, so the drive died..

but why would that have brought down the whole system? i understand that if there's an non-correctable memory error it would lock up the whole system to prevent errors being written to the filesystem, but i don't think this should've happened..

right?

thanks!
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
Well your kit looks good so that shouldn't have happened
Losing one of the boot disks may have prevented you from booting, but shouldn't prevent the OS from keeping going

What are the boot discs BTW - you say Samsung, but say the data drives are EVO's. You don't say what the boot disks are. Also where is your system dataset?
 

jcizzo

Explorer
Joined
Jan 20, 2023
Messages
79
thanks for getting back to me.

it wasn't a boot disk that died, it was one of the drives for my data pool. i was just doing a file copy, around ~160Gigs in total. it locked-up around the 8Gig mark.
the SSDs for boot and data pool are directly connected to the sata ports on the motherboard, so old firmware on the HBA shouldn't be a factor, and the motherboard has the latest stable firmware from supermicro, so.. and although cpu utilization during a file copy of that size will spike, heat is not an issue.. cpu at most gets up to 40c in those circumstances (otherwise it sits between 26-29c at idle and light use).

i can't imagine the firmware on the x710da2 nics being a factor.. i was copying movies back and forth when testing the 10Gig nics without an issue..
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
Not a clue I am afraid
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
i was migrating some of my data over to the data drives via the 10Gb lan and after about 8Gigs had transferred all stopped.. only way out of it was to ipmi into it and do a power reset. when i logged back in i received an alert telling me that one of my 870 evo's had a hard error that couldn't be recovered and TNC kicked it out of the pool...

ok, so the drive died..

but why would that have brought down the whole system?

Hi @jcizzo

A failing drive shouldn't have hung the system as zpool failmode is set to continue on data pools for this reason. Normally a hard lock like this that requires IPMI indicates a hardware failure, but you seem to have been pretty tightly sticking to Supermicro approved components.

My only point of suspicion is the LSI HBA - even though it isn't used, it's still drawing power, and might have decided to warm itself to the point of going unresponsive on the PCI bus if this setup is in a conventional tower case or tuned for low noise. If the HBA isn't being used at the moment, can you remove it and see if the issue happens again under a sustained transfer?
 
Joined
Oct 22, 2019
Messages
3,641
i was migrating some of my data over to the data drives via the 10Gb lan
What was this process? TrueNAS to TrueNAS? Client to TrueNAS? ZFS to ZFS? Other filesystem to ZFS?

Are you using native ZFS encryption, by any chance?
 
Joined
Oct 22, 2019
Messages
3,641
A failing drive shouldn't have hung the system as zpool failmode is set to continue on data pools for this reason.
To be more nit-picky, a single failed drive in a two-way mirror does not invoke "failmode". The zpool "failmode" property only deals with "catastrophic" pool failure. A degraded pool is still usable and not does invoke "failmode".
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
To be more nit-picky, a single failed drive in a two-way mirror does not invoke "failmode". The zpool "failmode" property only deals with "catastrophic" pool failure. A degraded pool is still usable and not does invoke "failmode".

Very true; a potential culprit though could have been the system-wide vfs.zfs.deadman timer though, which is set to "wait" - a bad disk that was still responsive on the SATA bus might have caused this to trigger.
 

jcizzo

Explorer
Joined
Jan 20, 2023
Messages
79
Hi @jcizzo

A failing drive shouldn't have hung the system as zpool failmode is set to continue on data pools for this reason. Normally a hard lock like this that requires IPMI indicates a hardware failure, but you seem to have been pretty tightly sticking to Supermicro approved components.

My only point of suspicion is the LSI HBA - even though it isn't used, it's still drawing power, and might have decided to warm itself to the point of going unresponsive on the PCI bus if this setup is in a conventional tower case or tuned for low noise. If the HBA isn't being used at the moment, can you remove it and see if the issue happens again under a sustained transfer?
heat isn't an issue with the hba because i upgraded it with a heatsink that's twice the size of the original and fan attached directly to it. i did this after discovering how hot it became. now, even after heavy use, if one were to touch it, the heat is barely noticeable.
 

jcizzo

Explorer
Joined
Jan 20, 2023
Messages
79
the only thing i can think of, and this is because i still don't know the ins-and-outs of zfs, i DO have a 500GB nvme drive that is intended to serve as a slog or write cache (apparently they're different.. still trying to figure it out though), that i was going to devote as a zil slog to the 5 spinners when i add them. in the advanced tab, under write cache log (or however it's labeled), it says "over provisioning an ssd can increase performance and reliability blah blah.." i had set that to 256GB however the drive wasn't made part of any pool yet.. could that have done it? i'm not sure how that whole thing works. I know you're probably all rolling your eyes in dismay over my lack of knowledge, so please feel free to enlighten me!

Thanks!
 
Last edited:
Joined
Oct 22, 2019
Messages
3,641
Before it gets lost in the weeds, what about this:

 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
the only thing i can think of, and this is because i still don't know the ins-and-outs of zfs, i DO have a 500GB nvme drive that is intended to serve as a slog or write cache (apparently they're different.. still trying to figure it out though), that i was going to devote as a zil slog to the 5 spinners when i add them. in the advanced tab, under write cache log (or however it's labeled), it says "over provisioning an ssd can increase performance and reliability blah blah.." i had set that to 256GB however the drive wasn't made part of any pool yet.. could that have done it? i'm not sure how that whole thing works. I know you're probably all rolling your eyes in dismay over my lack of knowledge, so please feel free to enlighten me!

Thanks!
1. A SLOG is not a write cache. It never cache's writes and doesn't help with reads either. Note that ZFS does not have a write cache. It does have a ZIL that holds 5 seconds of writes in memory and then flushes these to disk whilst building another ZIL. This is not a cache
2. A SLOG has specific hardware requirements that a random 500GB NVME probably won't support. Hint if its a consumer drive - then it almost certainly won't be the right hardware (I am only saying this because If I say it 100% won't be good enough someone will pop up saying "what about this drive?"
3. In most user case a SLOG does nothing because they are not using sync writes. Sync writes are generally NFS, VMWare iSCSI and I believe Apple Mac based writes (maybe).
4. A SLOG is only really needed for sync writes to Virtual Disks & Databases. For file based transfers - change sync to disabled and get much faster transfers than sync with a SLOG
5. A SLOG needs limited size. Actual required sizes are (and I would add a bit extra just in case). 1Gb NIC = 1.25GB, 10Gb NIC=12.5GB (simplistically)
 

jcizzo

Explorer
Joined
Jan 20, 2023
Messages
79
1. A SLOG is not a write cache. It never cache's writes and doesn't help with reads either. Note that ZFS does not have a write cache. It does have a ZIL that holds 5 seconds of writes in memory and then flushes these to disk whilst building another ZIL. This is not a cache
2. A SLOG has specific hardware requirements that a random 500GB NVME probably won't support. Hint if its a consumer drive - then it almost certainly won't be the right hardware (I am only saying this because If I say it 100% won't be good enough someone will pop up saying "what about this drive?"
3. In most user case a SLOG does nothing because they are not using sync writes. Sync writes are generally NFS, VMWare iSCSI and I believe Apple Mac based writes (maybe).
4. A SLOG is only really needed for sync writes to Virtual Disks & Databases. For file based transfers - change sync to disabled and get much faster transfers than sync with a SLOG
5. A SLOG needs limited size. Actual required sizes are (and I would add a bit extra just in case). 1Gb NIC = 1.25GB, 10Gb NIC=12.5GB (simplistically)
knowing that i have this 500Gig NVME, is there a way i can tune the system to take advantage of it? when i was testing things out to see what this thing could do for network throughput.. transferring large files (30Gigs worth of movies) resulted in the following:
2 SSDs (truenas) to the mediastation (win10) = 6Gb-10Gb/s.. (if i recall correctly... it fluctuated a good bit.. still a reasonable rate).
nvme (truenas) to the mediastation (win10) resulted in 10Gb/s without issue
5 spinners (truenas raidz1) to mediastation (win10) yielded some 4-5Gb/s? or there abouts? i don't quite recall.. again, acceptable for 5 spinners

the same 3 tests going the other way (win10 media station to truenas core nas resulted in the following):
same files to the 2 SSD's ran at around started off with 10Gb/s then after several seconds settle down to 5-6Gb/s (i think)
same file to the nvme ran the full 10Gb/s the whole time.
same file to the spinners started off at 10Gb/s then after 5 seconds or so collapsed to 200Mb's or there abouts.. you could see the flushing going on as it would float from 200-350, than to 150, then back up.. the zfs cache donut on the dashboard would go completely full, the cpu would spike to 70%, sometimes more..

i used to thing the way it worked was the RAM cache would fill first and as it started to fill up it would write it's contents to a fast SSD or nvme so that A) it could keep up with the throughput of the incoming data stream and B) if there was a power outtage during the transfer, the contents of ram would have already been written to a drive.. then upon startup, zfs would see the contents and move them to the spinners (in my case).. Apparently that's not the case..

anyone know any tricks to make it so? makes me kick myself for spending the money on those intel x710s
 
Last edited:

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
knowing that i have this 500Gig NVME, is there a way i can tune the system to take advantage of it?
Search for L2ARC.

A double-digit bitflip will freeze the system (unless it's using ECC RDIMM?), but it should write an error somewhere.
 
Last edited:

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
@jcizzo - you need to fix the freeze first. Have you run memtest for several days first?

Based on your figures you mentioned - not really. Any writes are limited by the speed of the underlying pool.
With async writes - its as fast as it can go - you cannot improve this other than by changing the underlying pool. Also - if the power goes out you will lose up to 5 seconds of data. The ZIL (I mentioned previously) is built in RAM and only in RAM and then flushed to disk

With sync writes the ZIL is built in RAM and also written to temporary permanent media (thats why its slow) before acknowledgment is sent to the writing OS. This ZIL is then flushed to permanent storage from RAM. A SLOG is an ultrafast device that is used to store the ZIL on before it gets written to permanent storage and thus speeds up sync writes only. Note that in terms of speed [Sync < Sync+SLOG < Async]. This is why a SLOG has very specific hardware requirements.

With the 500GB NVMe you may be able to improve read speeds - note may - see L2ARC . Do not use the NVMe to start with and run for a week or so of stable operation. Then check your ARC hit rate. If it > 90% then an L2ARC won't help (rule of thumb). Also you will probably want more memory before using a 500GB L2ARC (64GB would be better) as using L2ARC uses some ARC. An alternative use is L2ARC (Metadata only) which might help improve response times of things like folder browsing but would not use much of the 500GB in use.
 

Etorix

Wizard
Joined
Dec 30, 2020
Messages
2,134
knowing that i have this 500Gig NVME, is there a way i can tune the system to take advantage of it?
Looking for ways to throw in whatever spare hardware lies in the drawer is the wrong way to design a server.
One should rather define requirements and then identify suitable hardware to meet these requirements. If some of this hardware happens to be already owned, it's a freebie.
 

jcizzo

Explorer
Joined
Jan 20, 2023
Messages
79
Looking for ways to throw in whatever spare hardware lies in the drawer is the wrong way to design a server.
One should rather define requirements and then identify suitable hardware to meet these requirements. If some of this hardware happens to be already owned, it's a freebie.
well, i definitely wasn't looking to throw in spare hardware.. the nvme is a samsung 980 pro.. definitely overkill.. this is my first nas with 10Gb ethernet and i made the mistake of thinking running a raidz1 would be able to keep up with the transfers. it's a humble little system with the focus being on reliability, data protection (because my other nas uses btrfs as a filesystem.. BLEH!), simplicity, and if i can meet all the other requirements, low energy usage.. it seems to do all of that, i just mucked it with the 10Gb part..

i did run a memtest64 on it.. not for several days though. i thought just running through the test was good enough. the memory is brand new. yes, i'm aware that even new ram can be defective. just stating that because used ram might be more prone to errors if it's old.
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
A purist would say run the RAM test for a week
I like 24 hours as a minimum though

A Z1 of HDD's will not keep up with 10GB, not without a lot of vdevs striped together

Remove the 10Gb card?
 

jcizzo

Explorer
Joined
Jan 20, 2023
Messages
79
A purist would say run the RAM test for a week
I like 24 hours as a minimum though

A Z1 of HDD's will not keep up with 10GB, not without a lot of vdevs striped together

Remove the 10Gb card?
removing the card may happen..

this whole thing is a pain..

been wrestling with it for the past 5 hours. the other day i got an error from a relatively new ssd saying that there were problems and it was damaged, so i ordered another.. ran the same transfer tests today and after a few gigs had transferred; POOF! another error.. the drives seem to keep dying. i dunno what to make of it. that was going across the 10G line.. when i sent the files across the 1G nic, all went through fine..

then i got another error saying the pool was degraded and that truenas kicked a drive out.. rediculous
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
Looks to me that you have hardware issues - and not on the drives (assuming they are new).
Pull the HBA and 10Gb NIC (aka back to minimum basics), connect the drives to the motherboard, run memtest for 24 hours and report back
 
Top