Hi group,
We have a Supermicro baremetal with Intel(R) Xeon(R) CPU E5620 @ 2.40GHz, 64Gb of Ram and ZFS storage with three pools, in detail:
We are suffering sometimes problems with NFS clients which are an elasticsearch cluster. The problem happens with the backups, this is done with a curl from any node of the cluster, in this case three nodes, the situations is that when the backup starts, the clients (elasticsearch nodes) cant access to NFS server, they cant access, it is like the NFS server would be blocked. This is what the clients is telling:
For what i was reading, this could be happened because the clients cant synch with server, i think this is normal if we are writing a hundred of GB, but it is a dozen of GB because the elasticsearch's backup is incremental.
We can control the bites for sec that the clients sends to the NFS server and the chunk size of packages send it, this is what we have done two times already. So, i would to know if someone has happened this situations, if its normal or how to dig into this situation.
Cheers.
We have a Supermicro baremetal with Intel(R) Xeon(R) CPU E5620 @ 2.40GHz, 64Gb of Ram and ZFS storage with three pools, in detail:
Code:
nas1# lspci ... 2:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01) 02:00.1 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01) 03:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01) 03:00.1 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01) 04:00.0 RAID bus controller: 3ware Inc 9650SE SATA-II RAID PCIe (rev 01) 05:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) 05:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
Code:
nas1# camcontrol devlist <AMCC 9650SE-24M DISK 4.10> at scbus0 target 0 lun 0 (da0,pass0) <AMCC 9650SE-24M DISK 4.10> at scbus0 target 1 lun 0 (da1,pass1) <AMCC 9650SE-24M DISK 4.10> at scbus0 target 2 lun 0 (da2,pass2) <AMCC 9650SE-24M DISK 4.10> at scbus0 target 3 lun 0 (da3,pass3) <AMCC 9650SE-24M DISK 4.10> at scbus0 target 4 lun 0 (da4,pass4) <AMCC 9650SE-24M DISK 4.10> at scbus0 target 5 lun 0 (da5,pass5) <AMCC 9650SE-24M DISK 4.10> at scbus0 target 6 lun 0 (da6,pass6) <AMCC 9650SE-24M DISK 4.10> at scbus0 target 7 lun 0 (da7,pass7) <AMCC 9650SE-24M DISK 4.10> at scbus0 target 8 lun 0 (da8,pass8) <AMCC 9650SE-24M DISK 4.10> at scbus0 target 9 lun 0 (da9,pass9) <AMCC 9650SE-24M DISK 4.10> at scbus0 target 10 lun 0 (da10,pass10) <AMCC 9650SE-24M DISK 4.10> at scbus0 target 11 lun 0 (da11,pass11) <AMCC 9650SE-24M DISK 4.10> at scbus0 target 12 lun 0 (da12,pass12) <AMCC 9650SE-24M DISK 4.10> at scbus0 target 13 lun 0 (da13,pass13) <AMCC 9650SE-24M DISK 4.10> at scbus0 target 14 lun 0 (da14,pass14) <AMCC 9650SE-24M DISK 4.10> at scbus0 target 15 lun 0 (da15,pass15) <AMCC 9650SE-24M DISK 4.10> at scbus0 target 16 lun 0 (da16,pass16) <AMCC 9650SE-24M DISK 4.10> at scbus0 target 17 lun 0 (da17,pass17) <AMCC 9650SE-24M DISK 4.10> at scbus0 target 18 lun 0 (da18,pass18) <AMCC 9650SE-24M DISK 4.10> at scbus0 target 19 lun 0 (da19,pass19) <AMCC 9650SE-24M DISK 4.10> at scbus0 target 20 lun 0 (da20,pass20) <AMCC 9650SE-24M DISK 4.10> at scbus0 target 21 lun 0 (da21,pass21) <AMCC 9650SE-24M DISK 4.10> at scbus0 target 22 lun 0 (da22,pass22) <SAMSUNG MZ7WD120HAFV-00003 DXM87W3Q> at scbus1 target 0 lun 0 (ada0,pass23) <SAMSUNG MZ7WD120HAFV-00003 DXM87W3Q> at scbus2 target 0 lun 0 (ada1,pass24)
Code:
nas1# zpool list NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT Volume0 5.44T 658G 4.79T - 2% 11% 1.00x ONLINE /mnt Volume1 2.72T 916K 2.72T - 0% 0% 1.00x ONLINE /mnt Volume2 13.6T 7.88T 5.71T - 21% 57% 1.00x ONLINE /mnt freenas-boot 111G 1.50G 109G - - 1% 1.00x ONLINE -
Code:
nas1# zpool status Volumen0 cannot open 'Volumen0': no such pool nas1# zpool status Volume0 pool: Volume0 state: ONLINE scan: scrub repaired 0 in 1h40m with 0 errors on Sun Feb 8 01:40:48 2015 config: NAME STATE READ WRITE CKSUM Volume0 ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 gptid/7a8e031c-336c-11e4-91df-0025902eda9c ONLINE 0 0 0 gptid/7c57b51f-336c-11e4-91df-0025902eda9c ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 gptid/7d89cdcf-336c-11e4-91df-0025902eda9c ONLINE 0 0 0 gptid/7eba40e0-336c-11e4-91df-0025902eda9c ONLINE 0 0 0 errors: No known data errors
We are suffering sometimes problems with NFS clients which are an elasticsearch cluster. The problem happens with the backups, this is done with a curl from any node of the cluster, in this case three nodes, the situations is that when the backup starts, the clients (elasticsearch nodes) cant access to NFS server, they cant access, it is like the NFS server would be blocked. This is what the clients is telling:
Code:
Feb 27 07:01:22 kernel: [7409875.902785] nfs: server not responding, still trying Feb 27 07:01:23 kernel: [7409876.715406] nfs: server OK Feb 27 07:01:24 kernel: [7409877.014304] nfs: server not responding, still trying Feb 27 07:01:24 kernel: [7409877.452476] nfs: server OK Feb 27 07:01:24 kernel: [7409877.650035] nfs: server not responding, still trying Feb 27 07:01:24 kernel: [7409877.756992] nfs: server OK Feb 27 07:01:27 kernel: [7409880.480818] nfs: server not responding, still trying
For what i was reading, this could be happened because the clients cant synch with server, i think this is normal if we are writing a hundred of GB, but it is a dozen of GB because the elasticsearch's backup is incremental.
We can control the bites for sec that the clients sends to the NFS server and the chunk size of packages send it, this is what we have done two times already. So, i would to know if someone has happened this situations, if its normal or how to dig into this situation.
Cheers.