jamiejunk
Contributor
- Joined
- Jan 13, 2013
- Messages
- 134
Has anyone ever had problems with esx guest OSes being able to write to their drives because they had to many zfs snapshots?
I remember we had this problem about two years ago when we were hosting esx guests on 7200 RPM SAS drives. The issue ended up being that with to many zfs snapshots it would take to long for writes to happen. Eventually the linux esx guest would timeout and throw it’s file system into read only mode.
I thought with our new setup we wouldn’t have to worry about that so much being an all flash array. But we started having problems two days ago and today just got much worse.
We run about 50 VMs. They are pretty low usage volume. Looking at the network traffic to the nfs share it’s only pushing 100Mbit on average.
I was taking a snapshots once an hour and keeping them for about 10 days. Today is day 7 and it’s acting up. About 160 snapshots total. Each snapshot is about 500 meg worth of changes, below is an example:
So I guess i’m gonna ramp down the number of snapshots. Just wondering if anyone else has had this problem.
System Info:
FreeNAS-9.2.1.3-RELEASE-x64
Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz
Memory262088MB
JBOD connected via external SAS.
24 - SAMSUNG 840 Pro Series MZ-7PD512BW 2.5" 512GB SATA III MLC SSDs
Setup in as 6 vdevs of 4 drives each RaidZ2.
Mirrored ZILs are HGST s840Z s840 2.5″ SAS SSD
Cache drive is 200gig HGST s840 2.5″ SAS SSD
NFS Export to VMware ESX servers via 1 gig nic.
I remember we had this problem about two years ago when we were hosting esx guests on 7200 RPM SAS drives. The issue ended up being that with to many zfs snapshots it would take to long for writes to happen. Eventually the linux esx guest would timeout and throw it’s file system into read only mode.
I thought with our new setup we wouldn’t have to worry about that so much being an all flash array. But we started having problems two days ago and today just got much worse.
We run about 50 VMs. They are pretty low usage volume. Looking at the network traffic to the nfs share it’s only pushing 100Mbit on average.
I was taking a snapshots once an hour and keeping them for about 10 days. Today is day 7 and it’s acting up. About 160 snapshots total. Each snapshot is about 500 meg worth of changes, below is an example:
Code:
tank1/esx@auto-20140406.0732-10d 561M - 1.13T - tank1/esx@auto-20140406.0832-10d 541M - 1.13T - tank1/esx@auto-20140406.0932-10d 541M - 1.13T - tank1/esx@auto-20140406.1032-10d 533M - 1.13T - tank1/esx@auto-20140406.1132-10d 546M - 1.13T - tank1/esx@auto-20140406.1232-10d 586M - 1.13T - tank1/esx@auto-20140406.1332-10d 605M - 1.13T - tank1/esx@auto-20140406.1432-10d 606M - 1.13T - tank1/esx@auto-20140406.1532-10d 555M - 1.13T - tank1/esx@auto-20140406.1632-10d 587M - 1.13T - tank1/esx@auto-20140406.1732-10d 564M - 1.13T - tank1/esx@auto-20140406.1832-10d 574M - 1.13T - tank1/esx@auto-20140406.1932-10d 563M - 1.13T - tank1/esx@auto-20140406.2032-10d 575M - 1.13T - tank1/esx@auto-20140406.2132-10d 538M - 1.13T - tank1/esx@auto-20140406.2232-10d 512M - 1.13T - tank1/esx@auto-20140406.2332-10d 511M - 1.13T - tank1/esx@auto-20140407.0032-10d 514M - 1.13T - tank1/esx@auto-20140407.0132-10d 578M - 1.13T - tank1/esx@auto-20140407.0232-10d 536M - 1.13T - tank1/esx@auto-20140407.0332-10d 491M - 1.13T - tank1/esx@auto-20140407.0432-10d 575M - 1.13T - tank1/esx@auto-20140407.0532-10d 671M - 1.13T - tank1/esx@auto-20140407.0632-10d 699M - 1.13T - tank1/esx@auto-20140407.0732-10d 809M - 1.13T - tank1/esx@auto-20140407.0832-10d 1.04G - 1.15T - tank1/esx@auto-20140407.0932-10d 1012M - 1.15T - tank1/esx@auto-20140407.1032-10d 940M - 1.15T - tank1/esx@auto-20140407.1132-10d 961M - 1.15T - tank1/esx@auto-20140407.1232-10d 999M - 1.15T - tank1/esx@auto-20140407.1332-10d 1.01G - 1.15T - tank1/esx@auto-20140407.1432-10d 1.14G - 1.15T
So I guess i’m gonna ramp down the number of snapshots. Just wondering if anyone else has had this problem.
System Info:
FreeNAS-9.2.1.3-RELEASE-x64
Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz
Memory262088MB
JBOD connected via external SAS.
24 - SAMSUNG 840 Pro Series MZ-7PD512BW 2.5" 512GB SATA III MLC SSDs
Setup in as 6 vdevs of 4 drives each RaidZ2.
Mirrored ZILs are HGST s840Z s840 2.5″ SAS SSD
Cache drive is 200gig HGST s840 2.5″ SAS SSD
NFS Export to VMware ESX servers via 1 gig nic.
Code:
NAME STATE READ WRITE CKSUM tank1 ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 da0 ONLINE 0 0 0 da1 ONLINE 0 0 0 da2 ONLINE 0 0 0 da3 ONLINE 0 0 0 raidz2-1 ONLINE 0 0 0 da4 ONLINE 0 0 0 da5 ONLINE 0 0 0 da6 ONLINE 0 0 0 da7 ONLINE 0 0 0 raidz2-2 ONLINE 0 0 0 da8 ONLINE 0 0 0 da9 ONLINE 0 0 0 da10 ONLINE 0 0 0 da11 ONLINE 0 0 0 raidz2-3 ONLINE 0 0 0 da12 ONLINE 0 0 0 da14 ONLINE 0 0 0 da15 ONLINE 0 0 0 da16 ONLINE 0 0 0 raidz2-4 ONLINE 0 0 0 da17 ONLINE 0 0 0 da18 ONLINE 0 0 0 da19 ONLINE 0 0 0 da13 ONLINE 0 0 0 raidz2-5 ONLINE 0 0 0 da20 ONLINE 0 0 0 da21 ONLINE 0 0 0 da22 ONLINE 0 0 0 da23 ONLINE 0 0 0 logs mfid0 ONLINE 0 0 0 mfid2 ONLINE 0 0 0 cache mfid1 ONLINE 0 0 0