Thankyou for taking the time to examine my debug. That was so much more than I expected... I was only trying to help get to the bottom of the issues raised by the OP. With that in mind, I do not want to hijack this thread by going off topic, but I will respond briefly to many of issues you identified...
- You have 4 mirrors of your slog device (why!?)
Are you saying this is a configuration error, or simply not required? What you don't know is that I am in the process of building my home ESX lab, so NFS shares on this NAS will be hit hard soon. I obtained 4 small mSATA SSD's (and a PCIe SATA card to install them onto at a reasonable price) to improve NFS write performance, so I'm simply trying to wear level the SSDs over as broad an area as possible. If the configuration is incerrect, I'm happy to be pointed in the right direction or tackle this issue in a dedicated thread.
- You have errors that need serious investigation...
Example:
(ada6:ahcich10:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 80 80 1b e8 40 73 00 00 00 00 00
(ada6:ahcich10:0:0:0): CAM status: Uncorrectable parity/CRC error
(ada6:ahcich10:0:0:0): Retrying command
I was not aware of this. This error has not displayed itself to the web GUI nor sent to me as an email alert. I will investigate.
- You have kinit errors on bootup that seems very abnormal (likely a symptom of your problems, but may be the cause).
If this is on the right track, I hope others can confirm similar behaviour in order to help identify a pattern. That said, I have not noticed this error before, but I shall flex my google-fu and see what I can learn about it.
- You have mount requests to locations that don't exist
Example: Jul 14 19:49:08 nas1 mountd[2349]: mount request from 192.168.0.20 for non existent path /Raw
Eh? Yes it does... when AD is working, it mounts without issue. Perhaps there is a timing issue... this client was not affected by the power outage (on UPS) and so would be continuously requesting reconnection to /Raw before NAS1 had finished starting up. Since it works when AD is working, I an not concerned about this.
-There's evidence that at least one time the system crashed, rebooted spontaneously, or was powered off without a proper shutdown.
Yes. As mentioned in my previous post, the power went out (twice). I've just discovered its not connected to the UPS. DOH!
- You've got a lot of NT_STATUS_ACCESS_DENIED entries. That's not someone failing to authenticate (which is a different error), that's someone that is authenticating properly and being denied permission.
The string of errors between 9am and 2pm is probably my TV server trying to reconnect to the SMB share to which it is recording/reading. With AD not working, no wonder. Since it works when AD is working, I an not concerned about this.
- The boot device is reporting errors on ZFS that are apparently undiagnosed and uncorrected.
I am aware if this one. Got it via an email alert and in the web GUI. I have taken note of the details, reset the error counters (after generating the debug) and will monitor over time.
- You are using dedup with just 32GB of RAM (gulp!)
32Gb is all I can afford at the moment... and 32Bg ECC is better than the 16Gb non-ECC I used to have in it. I don't intend to abuse this feature and am storing only a small amount of data for testing purposes. Anyway, this is off topic.
I don't know you, but when I see all of these problems it does make me question how thorough the admin is at finding problems, identifying they are truly a problem, and then fixing them appropriately.
No, you don't know me. I am an IT Administrator with over 20 years experience maintaining enterprise systems. Virtualisation (VMware), networking, and storage networks are my daily bread a butter. Running a 12Tb NAS with 2 ESX hosts at home is small fry compared to what I work with... but at the same time it is different. SMB on non-Windows and NFS on non-EMC (or non fibre channel/LUN type storage) is new for me.
So my gut feeling is that something has been tweaked, not set properly, or not setup at all that is responsible for your problems.
Possible, but I have only ever used the WebGUI and have not tinkered with any CLI setting (except to investigate that boot drive/CRC error). I very carefully followed the FreeNAS Docs and read extensively about FreeNAS, Nexenta and ZFS in general, for many months before deploying my first "test" FreeNAS VM. I believe I educated myself more than the average noob before beginning this journey.
As for what that problem is, I have no clue. There's lots to read in the logs and such, but no errors that I saw that would tell me what is wrong. It does look like your system isn't even trying to do any kind of AD connection on bootup. But I have no idea why it would behave that way if it was previously enabled.
I agree, this problem does sound strange. But others have described this same problem, so I am happy that I am not alone in this regard. If only everyone would post logs/debugs etc. Simply saying "me too" doesn't help solve it.
It kind of takes me back to the "what was tweaked, not setup properly, or not setup at all" thing.
As mentioned, nothing tweaked outside the web GUI. Or, if the webGUI allowed me to make a configuration faux pas, I can't take full responsibility for it. I'm getting to the point where I'm thinking about re-installing from scratch and importing the existing zpools or moving to another platform (which I don't really want to do). Either way, I can't risk the declining WAF for much longer, just because things break after rebooting.
But quite a few of those bullets are major no-nos and should have been investigated thoroughly when they first happened. Others seems to be the result of doing things "just because" and not for any reason except to add complexity for no value.
Thankyou for your time and effort examining my debug. I was only expecting it to be used in the research of the AD-disabled-after-reboot problem. You have gone way above and beyond what I expected. I have taken on board the factual issues raised and, sorry to say, have ignored the editorial commentary.