Tank reaching 97%

Status
Not open for further replies.

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
As suspected, the condition of that pool is very serious, bordering on terminal. You have one dead drive, so you have no redundancy. You have a second drive that's throwing lots of errors--when that one dies, your entire pool dies. Replace the failed disk immediately. That won't fix the data that's already corrupt, but it will bring back a little bit of redundancy. Once that's done, you can worry about the other disk that's throwing errors--expect you'll need to replace that one as well.

What is this system being used for? If it stores important data, you really need to rebuild the pool into a more robust configuration--you have too little redundancy, and too wide a vdev.

But we're getting distracted from the subject of your thread. I still think your most likely issue is snapshots. Post, in code tags, the output of zfs list and zfs list -t snapshot.
 

dnet

Dabbler
Joined
Mar 27, 2014
Messages
23
As suspected, the condition of that pool is very serious, bordering on terminal. You have one dead drive, so you have no redundancy. You have a second drive that's throwing lots of errors--when that one dies, your entire pool dies. Replace the failed disk immediately. That won't fix the data that's already corrupt, but it will bring back a little bit of redundancy. Once that's done, you can worry about the other disk that's throwing errors--expect you'll need to replace that one as well.

What is this system being used for? If it stores important data, you really need to rebuild the pool into a more robust configuration--you have too little redundancy, and too wide a vdev.

But we're getting distracted from the subject of your thread. I still think your most likely issue is snapshots. Post, in code tags, the output of zfs list and zfs list -t snapshot.


I see..

zfs list output ..

Code:
[root@nas1 ~]# zfs list														 
NAME				  USED  AVAIL  REFER  MOUNTPOINT							
tank				 19.2T   398G  19.2T  /mnt/tank							 
tank/.system		 3.12M   398G  56.7K  /mnt/tank/.system					 
tank/.system/cores   51.2K   398G  51.2K  /mnt/tank/.system/cores			   
tank/.system/samba4  2.97M   398G  2.97M  /mnt/tank/.system/samba4			 
tank/.system/syslog  51.2K   398G  51.2K  /mnt/tank/.system/syslog			 
tank/iscsi			104K   398G  53.0K  /mnt/tank/iscsi					   
tank/iscsi/backupa   51.2K   398G  51.2K  /mnt/tank/iscsi/backupa


zfs list -t snapshot output..
Code:
[root@nas1 ~]# zfs list -t snapshot											 
NAME						 USED  AVAIL  REFER  MOUNTPOINT					 
tank@auto-20171109.0900-2w  14.8G	  -  19.2T  -							 
tank@auto-20171109.1000-2w  1.65M	  -  19.2T  -


TQVM
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
OK, doesn't look like it's snapshots after all--there weren't any there at all before today (assuming it's 9 Nov wherever you are; it's still 8 Nov here). How about the output (again, in code tags) of du -sh /mnt/tank/*?
 

dnet

Dabbler
Joined
Mar 27, 2014
Messages
23
OK, doesn't look like it's snapshots after all--there weren't any there at all before today (assuming it's 9 Nov wherever you are; it's still 8 Nov here). How about the output (again, in code tags) of du -sh /mnt/tank/*?

Output..

Code:
[root@nas1 ~]# du -sh /mnt/tank/*											   
19T	/mnt/tank/data1														 
5.0k	/mnt/tank/iscsi   


TQVM
 

dnet

Dabbler
Joined
Mar 27, 2014
Messages
23
OK, ls -lh /mnt/tank.
Code:
[root@nas1 ~]# ls -lh /mnt/tank												 
total 41126218213															   
drwxr-xr-x  2 www	 www		3B May 24  2014 .freenas					   
drwxr-xr-x  5 root	wheel	  5B Jun  4  2014 .system						
-rw-r--r--  1 root	wheel	 19T Nov  9 22:14 data1						 
drwxr-xr-x  3 nobody  nobody	 3B May 31  2013 iscsi

TQVM
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
OK, you have a single file called data1, that's 19TB in size, and owned by root. That file is consuming almost all the space on your pool. How are you using that file?
 

dnet

Dabbler
Joined
Mar 27, 2014
Messages
23
OK, you have a single file called data1, that's 19TB in size, and owned by root. That file is consuming almost all the space on your pool. How are you using that file?
Actually those files are data from some servers. It is important and backed up every day. We use the software and one server to manage the backup process, then it will be sent to the NAS via iscsi.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
So data1 is a file-based iSCSI extent that's consuming pretty much all your pool space? That's just all kinds of wrong, and it's probably why deleting data on the other end isn't freeing space on your pool. I'm not sure how you'd go about shrinking this; maybe one of the experts can chime in. @Ericloewe? @Arwen?

If this data is important, you need to get the pool fixed--replace the failed drive, and probably replace the second drive showing errors. That's the first priority. The second priority is to reduce the size of data1 to give yourself some breathing room.

The third priority is to rebuild the server with a sane configuration. If you need 20 TB of block storage (i.e., iSCSI), you need 40 TB of pool capacity, net of redundancy. About the smallest way to do that, as far as I can see, would be to put 8 x 8 TB disks in RAIDZ2.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
data1 is a file-based iSCSI extent that's consuming pretty much all your pool space? That's just all kinds of wrong, and it's probably why deleting data on the other end isn't freeing space on your pool. I'm not sure how you'd go about shrinking this; maybe one of the experts can chime in. @Ericloewe? @Arwen?
The filesystem that's using the extent is presumably not going to react well to sudden shrinkage of the underlying disk, so that would require one or more of the following:
  • Nuke it, restore from backup to a new share that is properly configured to not allow for the pool to get so full.
  • Add more storage.
  • Get rid of snapshots.
  • Move other data elsewhere.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,079
If this data is important, you need to get the pool fixed--replace the failed drive, and probably replace the second drive showing errors. That's the first priority. The second priority is to reduce the size of data1 to give yourself some breathing room.
@dnet just be sure you only replace one drive at a time. You must replace the totally failed drive first, let the resilver complete, then replace the drive that is giving errors, which looks to be da6 from the graphic you posted earlier.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Or functioning TRIM/UNMAP on the iSCSI initiator, if the virtual disk can afford to delete stuff. But this is a bit of a stopgap.

Forgot that one.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
From the output of zfs list -t snapshot posted up-thread, doesn't look like those are the issue.
I figured as much. It's not often that insanely full pools are insanely full due to snapshots.
 

dnet

Dabbler
Joined
Mar 27, 2014
Messages
23
So data1 is a file-based iSCSI extent that's consuming pretty much all your pool space? That's just all kinds of wrong, and it's probably why deleting data on the other end isn't freeing space on your pool. I'm not sure how you'd go about shrinking this; maybe one of the experts can chime in. @Ericloewe? @Arwen?

If this data is important, you need to get the pool fixed--replace the failed drive, and probably replace the second drive showing errors. That's the first priority. The second priority is to reduce the size of data1 to give yourself some breathing room.

The third priority is to rebuild the server with a sane configuration. If you need 20 TB of block storage (i.e., iSCSI), you need 40 TB of pool capacity, net of redundancy. About the smallest way to do that, as far as I can see, would be to put 8 x 8 TB disks in RAIDZ2.
Ok I will replace the failed drive immediately and reduce the size of data.
 

dnet

Dabbler
Joined
Mar 27, 2014
Messages
23
In our environment, NAS is mapped to windows server using iscsi. After reduce the size of data in windows server, the tank in NAS still showed 97%. Why? Or I also need to adjust something on NAS.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
iSCSI provides raw block storage. Unless the client is issuing UNMAP, the server has no way of knowing if something was deleted.
 

dnet

Dabbler
Joined
Mar 27, 2014
Messages
23
After replacing the hard disk, and resilvering process. But why the process did not change after hours. Or I need to restart the machine.

Code:
[root@nas1 ~]# zpool status																										 
  pool: tank																														
 state: UNAVAIL																													 
status: One or more devices is currently being resilvered.  The pool will														   
		continue to function, possibly in a degraded state.																		 
action: Wait for the resilver to complete.																						 
  scan: resilver in progress since Mon Nov 20 11:34:53 2017																		 
		148M scanned out of 21.4T at 4.37K/s, (scan is slow, no estimated time)													 
		12.4M resilvered, 0.00% done																								
config:																															 
																																	
		NAME											  STATE	 READ WRITE CKSUM												
		tank											  UNAVAIL  1.16K	 0	 0												
		  raidz1-0										UNAVAIL  1.16K	 0	 0												
			gptid/314214ac-c8a9-11e2-8927-002590c1fcf4	ONLINE	   0	 0	 0												
			gptid/31d202ce-c8a9-11e2-8927-002590c1fcf4	ONLINE	   0	 0	 0												
			gptid/3264371b-c8a9-11e2-8927-002590c1fcf4	ONLINE	   0	 0	 0												
			gptid/32f0c656-c8a9-11e2-8927-002590c1fcf4	ONLINE	   0	 0	 0												
			replacing-4								   UNAVAIL	  0	 0	 0												
			  9773083294005733761						 UNAVAIL	  0	 0	 0  was /dev/gptid/3380f7fb-c8a9-11e2-8927-002590c
1fcf4																															   
			  gptid/b60ebc44-cda3-11e7-916b-002590c1fcf4  ONLINE	   0	 0	 0  (resilvering)								 
			gptid/3418fcca-c8a9-11e2-8927-002590c1fcf4	ONLINE	   0	 0	 0												
			1338157980908881363						   REMOVED	  0	 0	 0  was /dev/gptid/34b1725d-c8a9-11e2-8927-002590c
1fcf4																															   
			gptid/35421063-c8a9-11e2-8927-002590c1fcf4	ONLINE	   0	 0	 0												
			gptid/35dbbdfb-c8a9-11e2-8927-002590c1fcf4	DEGRADED 1.16K	 0	 0  too many errors							   
			gptid/36690f03-c8a9-11e2-8927-002590c1fcf4	ONLINE	   0	 0	 0												
			gptid/36fcd7d8-c8a9-11e2-8927-002590c1fcf4	ONLINE	   0	 0	 0												
			gptid/378c07c4-c8a9-11e2-8927-002590c1fcf4	ONLINE	   0	 0	 0												
		logs																														
		  gptid/37d94c7f-c8a9-11e2-8927-002590c1fcf4	  ONLINE	   0	 0	 0												
																																	
errors: 6 data errors, use '-v' for a list 

TQVM
 
Status
Not open for further replies.
Top