Scampicfx
Contributor
- Joined
- Jul 4, 2016
- Messages
- 125
Dear community,
this is how my zpool looks right now:
What this volume does:
- This volume hosts two zvols for ESXi
- ESXi uses iSCSI to mount both zvols
- This is the zpool: san\zvol1, san\zvol2
What I did:
- There was heavy usage on this zpool due to lots of VM activity, when I entered following commands:
I monitored VMs closely. I couldn't notice a speed loss in write activty (or maybe it was only tiny).
Therefore, I run additional command:
(I don't know, if this was a wise decision :/)
All commands were issued within 5 minutes total time. Shortly after last command, FreeNAS began showing lots and lots of tracebacks on the screen (connected via IPMI). The errors flooded my monitor for about 2-3 minutes until FreeNAS initiated a hard reset itself.
Thereafter, the complete vdev mirror-1 was degraded.
Since there was still iSCSI activity, it only took a few moments after reboot for FreeNAS initiating a new error-traceback and flooding my monitor with error messages.
After the second reboot, the complete zpool was degraded.
What's now?
- ESXi is still running. A few VMs have corrupted file systems.
When typing the command, I receive following output:
My assumption
My first impression is, it was a bad idea to run set sync=always during heavy load activity.
However, I had a chat with a collegue who has lots of experience with FreeNAS.
He told me, that it should not happen that just running this command results in corrupted zpool. He advised that the only thing which might happen is reduced performance - but definetly no data degradation.
He pointed the error towards SAS HBA. In fact, his assumption is that both mirrored vdevs weren't running synchronized anymore which led to this degradation.
I would like to ask if this assumption is actually possible?
Mainboard: Supermicro X10SRH-CLN4F
SAS HBA: Onboard Broadcom 3008
Firmware of SAS HBA: well.. i remember i flashed the firmware to IT mode... however I can't find out which command shows me the firmware version.... i think the firmware version was one number below the driver version... e.g. driver 18, firmware 17... .but i do not remember the exact versions...
FreeNAS version: FreeNAS-9.10.2-U4 (27ae72978)
EDIT: I just found the command sas2flash -listall... So, I used it!
I'm worried about this result?
What else did I do?
- A few weeks ago, I added vdev mirror1 to this zpool. Originally, this zpool only consisted of mirror0. So, now it is (it was? ;)) a striped mirror zpool.
- I added this mirror1 while zpool was being used by ESXi (iSCSI)
- After adding this mirror, following message appeared on my screen:
(I have to admit that the number 17179865088 could have been a different one; I received this error everytime when doing something with the volume manager)
I filed a bug report: https://bugs.freenas.org/issues/24099
In the bug report, these error were classified as "Filter out useless messages". So I wasn't worried about these errors?
Questions:
- Is it possible that this error was some sort of silent-error which led to this data degradation?
- Is there anything wrong with SAS HBA?
- How is it possible that setting sync=always leads to an entire zpool degradation?
this is how my zpool looks right now:

What this volume does:
- This volume hosts two zvols for ESXi
- ESXi uses iSCSI to mount both zvols
- This is the zpool: san\zvol1, san\zvol2
What I did:
- There was heavy usage on this zpool due to lots of VM activity, when I entered following commands:
Code:
zfs set sync=always san\zvol1 zfs set sync=always san\zvol2
I monitored VMs closely. I couldn't notice a speed loss in write activty (or maybe it was only tiny).
Therefore, I run additional command:
Code:
zfs set sync=always san
(I don't know, if this was a wise decision :/)
All commands were issued within 5 minutes total time. Shortly after last command, FreeNAS began showing lots and lots of tracebacks on the screen (connected via IPMI). The errors flooded my monitor for about 2-3 minutes until FreeNAS initiated a hard reset itself.
Thereafter, the complete vdev mirror-1 was degraded.
Since there was still iSCSI activity, it only took a few moments after reboot for FreeNAS initiating a new error-traceback and flooding my monitor with error messages.
After the second reboot, the complete zpool was degraded.
What's now?
- ESXi is still running. A few VMs have corrupted file systems.
When typing the command
Code:
zpool status -v
Code:
pool: san state: DEGRADED status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://illumos.org/msg/ZFS-8000-8A scan: scrub repaired 0 in 17h41m with 0 errors on Sun Oct 8 17:41:56 2017 config: NAME STATE READ WRITE CKSUM san DEGRADED 0 0 8.17K mirror-0 DEGRADED 0 0 2.69K gptid/f1b8a859-4c84-11e7-b22a-0007433aed30 DEGRADED 0 0 2.69K too many errors gptid/f23bcfe7-4c84-11e7-b22a-0007433aed30 DEGRADED 0 0 2.69K too many errors mirror-1 DEGRADED 0 0 13.7K gptid/76c630fa-9d43-11e7-b75c-0007433aed30 DEGRADED 0 0 13.7K too many errors gptid/77562f36-9d43-11e7-b75c-0007433aed30 DEGRADED 0 0 13.7K too many errors errors: Permanent errors have been detected in the following files: san/tractorunit-data:<0x1>
My assumption
My first impression is, it was a bad idea to run set sync=always during heavy load activity.
However, I had a chat with a collegue who has lots of experience with FreeNAS.
He told me, that it should not happen that just running this command results in corrupted zpool. He advised that the only thing which might happen is reduced performance - but definetly no data degradation.
He pointed the error towards SAS HBA. In fact, his assumption is that both mirrored vdevs weren't running synchronized anymore which led to this degradation.
I would like to ask if this assumption is actually possible?
Mainboard: Supermicro X10SRH-CLN4F
SAS HBA: Onboard Broadcom 3008
Firmware of SAS HBA: well.. i remember i flashed the firmware to IT mode... however I can't find out which command shows me the firmware version.... i think the firmware version was one number below the driver version... e.g. driver 18, firmware 17... .but i do not remember the exact versions...
FreeNAS version: FreeNAS-9.10.2-U4 (27ae72978)
EDIT: I just found the command sas2flash -listall... So, I used it!
Code:
[root@storageunit ~]# sas2flash -listall LSI Corporation SAS2 Flash Utility Version 16.00.00.00 (2013.03.01) Copyright (c) 2008-2013 LSI Corporation. All rights reserved No LSI SAS adapters found! Limited Command Set Available! ERROR: Command Not allowed without an adapter! ERROR: Couldn't Create Command -listall Exiting Program. [root@storageunit ~]#
I'm worried about this result?
What else did I do?
- A few weeks ago, I added vdev mirror1 to this zpool. Originally, this zpool only consisted of mirror0. So, now it is (it was? ;)) a striped mirror zpool.
- I added this mirror1 while zpool was being used by ESXi (iSCSI)
- After adding this mirror, following message appeared on my screen:
Code:
May 22 18:53:45 freenas savecore: error reading last dump header at offset 17179865088 in /dev/dumpdev: Invalid argument
(I have to admit that the number 17179865088 could have been a different one; I received this error everytime when doing something with the volume manager)
I filed a bug report: https://bugs.freenas.org/issues/24099
In the bug report, these error were classified as "Filter out useless messages". So I wasn't worried about these errors?
Questions:
- Is it possible that this error was some sort of silent-error which led to this data degradation?
- Is there anything wrong with SAS HBA?
- How is it possible that setting sync=always leads to an entire zpool degradation?
Last edited: