dxun
Explorer
- Joined
- Jan 24, 2016
- Messages
- 52
Last several days I've been trying (in vain) to successully replicate a 6 TB encypted pool to a target pool.
The source and target pools are residing in separate machines which are:
- both running the same version of TrueNAS Core (12.0-U6.1)
- both running the same motherboard (Supermicro X10SLM+-F),
- both using an identical LSI SAS 2008 controller card (and identical BIOSes in IT mode)
- almost-same version of BIOS ("source" system is on `latest-1` version, as opposed to "target", which is on `latest`),
- slightly different versions of CPUs (Xeon E3-1231v3 vs E3-1220v3)
- both use ECC RAM (32 GB vs 24 GB)
The source pool resides on a 4x4 RAID-Z1 fusion pool and is being replicated to 4x10 TB fusion pool (striped mirror configuration).
Both metadata vdevs on both ends are running a mirrored SSD configuration.
I am using GUI exclusively to drive this replication as I am not yet proficient in the arts of ZFS CLI. My understanding (implied by U6.1 version) is that this version is production-ready.
The result so far is a (complicated) smorgasbord of crashes (kernel panics) on both ends.
My first thought here was a faulty RAM (somewhere) but these machines have been working flawlessly at least several years that I am (for now) discounting this - that a faulty RAM stick might happen in one machine.....but in both? At the same time? I am not sure I am that (un)lucky. I will be running the memtest86 on both of these machines really soon but these consistent crashes are really frustrating as I keep getting to about 80-90% through, only to experience a crash and ruin a pool. I am currently doing a final replication run from a pool to a pool and if that one crashes, I am definitely doing the manual copy to at least have one decent backup somewhere.
I was able to manually copy the pool's contents from one end to another. I was also able to complete scrubs in both pools - no errors found. Spot checks playing tes of media files (though inconclusive) yielded no problems with playing even the largest of files (some media is > 50 GB large).
I've logged two tickets on the TrueNAS JIRA (https://jira.ixsystems.com/browse/NAS-113477 and https://jira.ixsystems.com/browse/NAS-113491). These tickets go into a much deeper characterisation of what is being tried, what is crashing and when, so I'd rather not rehash all of that here. Instead, I thought I'd collect some thoughts from experienced people here - really appreciate anyone spending the time looking into these.....I tried to be as exhaustive/precise as possible but the amount of details might be a bit daunting.
If anyone is interested further, please don't hesitate to suggest probable causes or avenues to investigate.
The source and target pools are residing in separate machines which are:
- both running the same version of TrueNAS Core (12.0-U6.1)
- both running the same motherboard (Supermicro X10SLM+-F),
- both using an identical LSI SAS 2008 controller card (and identical BIOSes in IT mode)
- almost-same version of BIOS ("source" system is on `latest-1` version, as opposed to "target", which is on `latest`),
- slightly different versions of CPUs (Xeon E3-1231v3 vs E3-1220v3)
- both use ECC RAM (32 GB vs 24 GB)
The source pool resides on a 4x4 RAID-Z1 fusion pool and is being replicated to 4x10 TB fusion pool (striped mirror configuration).
Both metadata vdevs on both ends are running a mirrored SSD configuration.
I am using GUI exclusively to drive this replication as I am not yet proficient in the arts of ZFS CLI. My understanding (implied by U6.1 version) is that this version is production-ready.
The result so far is a (complicated) smorgasbord of crashes (kernel panics) on both ends.
My first thought here was a faulty RAM (somewhere) but these machines have been working flawlessly at least several years that I am (for now) discounting this - that a faulty RAM stick might happen in one machine.....but in both? At the same time? I am not sure I am that (un)lucky. I will be running the memtest86 on both of these machines really soon but these consistent crashes are really frustrating as I keep getting to about 80-90% through, only to experience a crash and ruin a pool. I am currently doing a final replication run from a pool to a pool and if that one crashes, I am definitely doing the manual copy to at least have one decent backup somewhere.
I was able to manually copy the pool's contents from one end to another. I was also able to complete scrubs in both pools - no errors found. Spot checks playing tes of media files (though inconclusive) yielded no problems with playing even the largest of files (some media is > 50 GB large).
I've logged two tickets on the TrueNAS JIRA (https://jira.ixsystems.com/browse/NAS-113477 and https://jira.ixsystems.com/browse/NAS-113491). These tickets go into a much deeper characterisation of what is being tried, what is crashing and when, so I'd rather not rehash all of that here. Instead, I thought I'd collect some thoughts from experienced people here - really appreciate anyone spending the time looking into these.....I tried to be as exhaustive/precise as possible but the amount of details might be a bit daunting.
If anyone is interested further, please don't hesitate to suggest probable causes or avenues to investigate.
Last edited: