I'm not sure where to begin..
Right now I can't even log into my TN server to get version information. But I'm on TN Core 12.xx Not sure what rev.
This is a live production system and can't just be rebooted or wiped, etc without major inconvenence to the management -- they're already mad enough that we are having the problem I outline below...
HP DL385 Gen10 plus chassis
32-core Epyc
256GB ECC RAM
4x 10Gb Nics (all for iSCSI)
4x 1Gb Nics (1 used for management)
2 pools:
Tier1: 6 1tb NVMe in striped mirror
Tier2: 6 8tb SAS12 Hdd in striped mirror - 1tb NVMe as SLOG and 1TB NVMe cache
I can't tell you pool utilization at the moment because the server won't respond and I can't get the web gui up. I'll get more details as soon as I can, but suffice it to say that both pools are WAY past 50%.
The Problem:
Any time we write a large chunk of data to Tier2, we wind up in an oscillating condition where iSCSI drops all the connections to the VMs. Then, after a couple minutes, it reconnects, and then a couple minutes later it disconnects again. This goes on and on and on until TN is done writing (or whatever it is doing) with the data.
After our backup started and took its snapshot (Veeam), I wrote 400GB to the array. Now, VMWare is trying to consolidate the snapshot and it's entered the oscillation. Since VMWare is dependent on the iSCSI to consolidate the snapshot, the disconnect/reconnect business means that it is literally taking ALL DAY to complete the consolidation.
This is causing my main file server VM and my exchange VM (among others) to hang for minutes at a time. This is causing a major disruption in business continuity, and we are afraid that if we force a reboot to kill the oscillation that we may lose data -- so we wait.
The management is not very happy with us right now, and understandably so.
What information can I provide (as soon as I can get the GUI up) to help you help me:
1) get the oscillation to stop to restore operations
2) repair the misconfiguration that is allowing this to happen in the first place
Any help is greatly appreciated.
Right now I can't even log into my TN server to get version information. But I'm on TN Core 12.xx Not sure what rev.
This is a live production system and can't just be rebooted or wiped, etc without major inconvenence to the management -- they're already mad enough that we are having the problem I outline below...
HP DL385 Gen10 plus chassis
32-core Epyc
256GB ECC RAM
4x 10Gb Nics (all for iSCSI)
4x 1Gb Nics (1 used for management)
2 pools:
Tier1: 6 1tb NVMe in striped mirror
Tier2: 6 8tb SAS12 Hdd in striped mirror - 1tb NVMe as SLOG and 1TB NVMe cache
I can't tell you pool utilization at the moment because the server won't respond and I can't get the web gui up. I'll get more details as soon as I can, but suffice it to say that both pools are WAY past 50%.
The Problem:
Any time we write a large chunk of data to Tier2, we wind up in an oscillating condition where iSCSI drops all the connections to the VMs. Then, after a couple minutes, it reconnects, and then a couple minutes later it disconnects again. This goes on and on and on until TN is done writing (or whatever it is doing) with the data.
After our backup started and took its snapshot (Veeam), I wrote 400GB to the array. Now, VMWare is trying to consolidate the snapshot and it's entered the oscillation. Since VMWare is dependent on the iSCSI to consolidate the snapshot, the disconnect/reconnect business means that it is literally taking ALL DAY to complete the consolidation.
This is causing my main file server VM and my exchange VM (among others) to hang for minutes at a time. This is causing a major disruption in business continuity, and we are afraid that if we force a reboot to kill the oscillation that we may lose data -- so we wait.
The management is not very happy with us right now, and understandably so.
What information can I provide (as soon as I can get the GUI up) to help you help me:
1) get the oscillation to stop to restore operations
2) repair the misconfiguration that is allowing this to happen in the first place
Any help is greatly appreciated.