Tossed into the deep end with FreeNAS, full pool, AD issues

SuperNoobAdmin

Dabbler
Joined
Sep 30, 2021
Messages
10
Hello all!
So, a quick TL;DR sob story for context - was on help desk, sysadmin quit, got thrown into role. I do not have enough knowledge at all so pre-emptive sorry for my ignorance.

We have a FreeNAS (FreeNAS-11.2-U7) box that hosts our backups that run via Veeam. I don't know why (again I'm too dumb for this role) but suddenly our FreeNAS was full, and I mean 100% full. 0 bytes free. One of the analysts on the team tried to do a restore to a snapshot where it had some space free. It freed up SOME space, but now from my limited Googling it seems to have lost our domain. I assumed this to begin with because Veeam is throwing errors about incorrect username/password but nothing has changed on the service account it runs with.


So far I've tried:
Rebooting the SMB service - giving it a few minutes before turning it back on.
Checked all the settings in the AD portion of the UI, all the settings look correct to my domain.
Looked under network settings, correct everything in there as well.
When I run wbinfo -u or -g it turns up "Error looking up domain users"

I just need to be able to get it to talk to my domain so my backups can run again, and would appreciate any help more than I can describe.
 
Last edited by a moderator:

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
do you have any snapshots that you could remove
I suggest you could always remove it from the domain and then re-add the box back to the domain.

You need to fix the space issue as well as the domain issue - and the chances are they are relatively unrelated.

For space - we need to know the hardware spec, how the pools are built and how many spare drive slots you have
 

SuperNoobAdmin

Dabbler
Joined
Sep 30, 2021
Messages
10
Uh, again I am so sorry for my ignorance.
How do I determine how a pool is built? Like, it's listed as ZFS, is it that simple or more complex?
For removing and re-adding it to the domain, just delete the info in the AD menu, save, and re-enter it? Do I need to take it out of AD as well?
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
mod note: retitled thread to something a bit more descriptive.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Man I can't even finish a thought correctly.

No worries.

I do not have enough knowledge at all so pre-emptive sorry for my ignorance.

Everyone's in that boat, it's just a matter of degree. I spent an hour this morning trying to figure out if there was a way to force ESXi to release a local VMDK lock for a file which had no open file descriptors, but was nevertheless locked with a ro lock. I ended up having to handwave and suggest a reboot of the hypervisor.
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
Uh, again I am so sorry for my ignorance.
How do I determine how a pool is built? Like, it's listed as ZFS, is it that simple or more complex?
For removing and re-adding it to the domain, just delete the info in the AD menu, save, and re-enter it? Do I need to take it out of AD as well?
For removing and re-adding it to the domain, just delete the info in the AD menu, save, and re-enter it

In the shell can you type zpool status "PoolName" and post the output?
 

SuperNoobAdmin

Dabbler
Joined
Sep 30, 2021
Messages
10
Oh man, so when I dump the info out of the AD menu it won't allow me to save it as a blank.

zpool returns:
pool: veeam_pool01
state: ONLINE
scan: scrub repaired 0 in 1 days 11:04:39 with 0 errors on Mon Sep 13 11:04:46 2021
config:

NAME STATE READ WRITE CKSUM
veeam_pool01 ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
gptid/d6b00505-ead5-11e9-a6cc-b4969146d528 ONLINE 0 0 0
gptid/d738f0a6-ead5-11e9-a6cc-b4969146d528 ONLINE 0 0 0
gptid/d7bd0992-ead5-11e9-a6cc-b4969146d528 ONLINE 0 0 0
gptid/d848b727-ead5-11e9-a6cc-b4969146d528 ONLINE 0 0 0
gptid/d8d08d32-ead5-11e9-a6cc-b4969146d528 ONLINE 0 0 0
gptid/d96d606d-ead5-11e9-a6cc-b4969146d528 ONLINE 0 0 0
gptid/d9f62d2b-ead5-11e9-a6cc-b4969146d528 ONLINE 0 0 0
gptid/da857305-ead5-11e9-a6cc-b4969146d528 ONLINE 0 0 0
gptid/db115f3f-ead5-11e9-a6cc-b4969146d528 ONLINE 0 0 0
gptid/dba30038-ead5-11e9-a6cc-b4969146d528 ONLINE 0 0 0
gptid/dc2e676e-ead5-11e9-a6cc-b4969146d528 ONLINE 0 0 0
logs
gptid/4565a9e2-0004-11ea-b1c3-b4969146d528 ONLINE 0 0 0
cache
gptid/97e6c7ca-0004-11ea-b1c3-b4969146d528 ONLINE 0 0 0
spares
gptid/dcca0f4b-ead5-11e9-a6cc-b4969146d528 AVAIL

errors: No known data errors
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
Also look under storage/snapshots - are there a lot of snapshots?
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
How old are the snapshots - if they are more than a week or so old then they perhaps ought to be deleted - they are NOT backups

Also - I notice you have a cache drive. Is this a metadata drive only cache or a normal cache drive (in which case I doubt its doing you any good, and may even be slowing things down depending on your hardware spec - which brings me to one more thing.

Please post your hardware spec - have a look at whats in my signature for an idea.
 

SuperNoobAdmin

Dabbler
Joined
Sep 30, 2021
Messages
10
All the snapshots are within a week old.
I will have to reach out on the cache drive - I was taught nothing at all of the system before the old sysadmin left. In fact I only found out it was a thing because Veeam broke.

Is there a place within the interface to dump the specs as you have shown, or will I just have to try to get it from old employees?
 

anodos

Sambassador
iXsystems
Joined
Mar 6, 2014
Messages
9,553
Regarding AD, 11.2 is quite old. Probably best place to start is by running command sh /etc/directoryservice/ActiveDirectory/ctl start. This is pre-11.3 CLI command to start the AD service. Might give some clues about why it's failing.
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
Is there a place within the interface to dump the specs as you have shown, or will I just have to try to get it from old employees?
No - I am afraid not - at least not as one list (that I know of)
The Dashboard will give a lot of info - possibly Platform info, CPU and memory. Other stuff from networks and storage
You ought to know what you are working with as that will define what you can do to fix the lack of disk space issue which given you have an 11 disk RAIDz1 is not going to be easy to do necessarily.
How much space do you have on your pool? [See Storage/Pools, but its also on the dashboard]
 

anodos

Sambassador
iXsystems
Joined
Mar 6, 2014
Messages
9,553
It is? Can I easily update or is that a process?

That command returns:
False
True
Join is OK
False
True
Generally you plan out upgrades ahead of time. Plan a maintenance window. Write down / think out steps you will go through in upgrade and steps you will go through to revert if you need to. Experiment with a VM to get a good feel for it.

I'd take care of things in the following order:
1) fix you pool usage issue (free up space) / stop filling to 100%
2) fix your AD join
3) other things that need to be taken care of urgently (didn't read this close enough to tell for sure)
4) plan upgrade
 

SuperNoobAdmin

Dabbler
Joined
Sep 30, 2021
Messages
10
Generally you plan out upgrades ahead of time. Plan a maintenance window. Write down / think out steps you will go through in upgrade and steps you will go through to revert if you need to. Experiment with a VM to get a good feel for it.

I'd take care of things in the following order:
1) fix you pool usage issue (free up space) / stop filling to 100%
2) fix your AD join
3) other things that need to be taken care of urgently (didn't read this close enough to tell for sure)
4) plan upgrade
So, nothing is actually hosted on it other than backups, and the backups on it are busted and thus not being used anyway.
Thank you for the detailed plan, I appreciate it.
The process of upgrading itself though, is it an actual process, or is it like ESA where I click "update" and then play the hurry up and wait game for a while.

No - I am afraid not - at least not as one list (that I know of)
The Dashboard will give a lot of info - possibly Platform info, CPU and memory. Other stuff from networks and storage
You ought to know what you are working with as that will define what you can do to fix the lack of disk space issue which given you have an 11 disk RAIDz1 is not going to be easy to do necessarily.
How much space do you have on your pool? [See Storage/Pools, but its also on the dashboard]
So what I can get form the dashboard:
  • Intel(R) Xeon(R) CPU E5-2603 v4 @ 1.70GHz (6 cores)
  • 32 Gigs memory
  • 16 disks listed
    • Two 1TiB - Listed as my backup pool name (Veeam_pool01)
    • Two 32 GiB - Listed as Boot Pool
    • Twelve 10 TiB - Also Veeam_pool01
  • Powered by a UPS
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
And how much space is left in the pool?
Also do you have any slots for extra drives in the chassis?

2 * 1TB - probably SLOG & cache as separate drives
12*10TB - RAIDZ1 vdev
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
I am thinking that @SuperNoobAdmin needs to hire some help to get this sorted out. He doesn't know enough about what he's got in order to make any decisions about what to do going forward. And he seems unfamiliar enough with this kit to get himself into trouble by doing the wrong thing.

Where are you based - country and rough location? Maybe someone on here could help
 

SuperNoobAdmin

Dabbler
Joined
Sep 30, 2021
Messages
10
Oh I am 100% not qualified for it. In fact, I will probably make things worse when my idiot-drool just gets all over and makes it sticky. I live in the USA, Montana, Flathead Valley.
255 GiB left in the pool.
No free bays.
 
Last edited:

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
255 GIB left from 90TB is waaaaaayyyyyy too little. But then you know that.
And no bays - this will be fun.

You have 5 options:
1. Get the veeam administrator (if he can still use the storage - and thats a big if) to do something to the backups and shrink them to something more manageable by removing some of the incrementals. Of course he is highly unlikely to be able to do this as thats likely to need a lot more working space than you have. You might start by deleting all snapshots AND preventing any more from being created, at least until you know how much storage you will release. Note that this will not release space immediately - it takes a bit of time.
2. You could spend the better part of a month swapping disks out one by one - but thats not a good idea for several reasons (risk to pool and non-availability of veeam storage space until you have finished).
3. Trash the backups, trash the pool and start again - this is not a terribly serious suggestion
4. Build a new server, with new disks (200TiB or so) and set that up as a new repository. Use RAIDZ2 as a minimum, preferably Z3 and consider very carefully how to arrange this pool into vdevs (one single vdev or several vdevs). This would be subject of a whole new discussion and Veeam may not work until it gets proper access back to the old repository - which means more space there
5. Expand the existing server
  1. Buy a suitable LSI card - with external ports - flashed to IT mode
  2. Buy a new or second hand drive shelf with built in SAS expander - also flashed to IT mode (err I am not 100% sure on this - hopefully someone else can confirm). 24 Drive bays would be good as you will see later
  3. Buy (and this is just a suggestion) enough disks to make a RAIDZ2 or Z3 vdev of larger size than you have already. Say 200TB of useable space. Get the best bang for the buck that you can. 12*16TB+3 RAIDZ3=15 discs. Create a new pool of the new discs and copy the data over (you could just add the vdev to the existing pool and whilst it would work, its bad practice on several grounds. This would let Veeam work. Then when Veeam is working again trash the existing (old) pool and turn it into a second repository with a RAIDZ2 or Z3 vdev. You might need an extra couple of SSD's for SLOG and cache depending on how the pool is accessed by Veeam
Caveat 1: The new pool of 15 discs is larger than is considered entirely sensible - some people reckon a pool should be no bigger than 12 disks. So an alternative would be 2*100TiB Z2 Pools striped to make 200TiB. Each vdev might be 6 or 7 data discs + 2 for Z2 = 18 discs total (+log, cache and spare)

Caveat 2: I doubt your cache drive is doing anything other than slow things down - you don't have enough memory to sensible run an L2ARC of 1TB. If its a metadata only cache then objection withdrawn (and its probably a good idea)

Caveat 3: Your Log drive may or may not be helping (but is unlikely to be hindering). This should be a decent optane rather than a typical SSD (or even a cheap Optane) - do you know what kind of drive it is. Also its effectiveness is very dependant on pool settings and how Veeam accesses the storage.

That's my 2p worth - hopefully others will chime in with different / better / worse ideas. Just please don't blame TrueNAS for this - its done its best keeping your data safe - its just run out of room and there are ZFS limitations on how you can expand a pool (which might be fixed late next year....... but I suspect you can't wait that long)
 
Top