Gianni_Pitto
Cadet
- Joined
- Mar 5, 2021
- Messages
- 7
Hello ,
I have this kind of big, fresh, truenas installation, on a hp dl380 server, xeon 4208 @2.10 ghz, 128 gb ram
.
I have the o.s. installed on a raid1 (hw), then 10 disks on raid 6 (no raid hw, raid managed by truens) for an available space of 40,25tb
So, the problem here is that this truenas server worked well for 10 days circa, then suddently stopped working. We could ping it, but nothing else, samba shares , gui or ssh where unresponsive.
So the strange behaviour is that when I reboot it, it will work for 5 / 10 minutes circa, then it will do the same. I have another truenas on the same site (same hw & config, but never ever used) and now at most i can give simple command via remote ssh, ,like 'time' 'date', but no 'top' or 'reboot' because in this way the console will be freezed.
In the 5-minutes-time-frame when the server was availabe I was able to donwload logs, but I can't figure out what is happening from log side...
for example every time the system freeze (you can notice a big time gaps in the logs) I can see only messages like those you can see at the end of the code, just before the time gap 16.20-10.49
I have the entire var/logs, I can post others log if needed.
Please help
I have this kind of big, fresh, truenas installation, on a hp dl380 server, xeon 4208 @2.10 ghz, 128 gb ram
.
I have the o.s. installed on a raid1 (hw), then 10 disks on raid 6 (no raid hw, raid managed by truens) for an available space of 40,25tb
So, the problem here is that this truenas server worked well for 10 days circa, then suddently stopped working. We could ping it, but nothing else, samba shares , gui or ssh where unresponsive.
So the strange behaviour is that when I reboot it, it will work for 5 / 10 minutes circa, then it will do the same. I have another truenas on the same site (same hw & config, but never ever used) and now at most i can give simple command via remote ssh, ,like 'time' 'date', but no 'top' or 'reboot' because in this way the console will be freezed.
In the 5-minutes-time-frame when the server was availabe I was able to donwload logs, but I can't figure out what is happening from log side...
for example every time the system freeze (you can notice a big time gaps in the logs) I can see only messages like those you can see at the end of the code, just before the time gap 16.20-10.49
I have the entire var/logs, I can post others log if needed.
Please help
Code:
Mar 15 16:19:44 nas /etc/rc: WARNING: failed to start rpcbind Mar 15 16:19:44 nas syslog-ng[3015]: Error processing log message: <(0x103c) EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus1 Mar 15 16:19:44 nas ugen0.1: <0x8086 XHCI root HUB> at usbus0 Mar 15 16:19:44 nas uhub1: <0x8086 XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus0 Mar 15 16:19:44 nas da1 at smartpqi0 bus 0 scbus2 target 66 lun 0 Mar 15 16:19:44 nas da1: <ATA MB006000GWBXQ HPG8> Fixed Direct Access SPC-4 SCSI device Mar 15 16:19:44 nas da1: Serial Number 78I8K0NXF4FE Mar 15 16:19:44 nas da1: 1200.000MB/s transfers Mar 15 16:19:44 nas da1: Command Queueing enabled Mar 15 16:19:44 nas da1: 5723166MB (11721045168 512 byte sectors) Mar 15 16:19:44 nas da2 at smartpqi0 bus 0 scbus2 target 67 lun 0 Mar 15 16:19:44 nas da2: <ATA MB006000GWBXQ HPG8> Fixed Direct Access SPC-4 SCSI device Mar 15 16:19:44 nas da2: Serial Number 78I7K0W6F4FE Mar 15 16:19:44 nas da2: 1200.000MB/s transfers Mar 15 16:19:44 nas da2: Command Queueing enabled Mar 15 16:19:44 nas da2: 5723166MB (11721045168 512 byte sectors) Mar 15 16:19:44 nas da3 at smartpqi0 bus 0 scbus2 target 68 lun 0 Mar 15 16:19:44 nas da3: <ATA MB006000GWBXQ HPG8> Fixed Direct Access SPC-4 SCSI device Mar 15 16:19:44 nas da3: Serial Number 78I8K0NZF4FE Mar 15 16:19:44 nas da3: 1200.000MB/s transfers Mar 15 16:19:44 nas da3: Command Queueing enabled Mar 15 16:19:44 nas da3: 5723166MB (11721045168 512 byte sectors) Mar 15 16:19:44 nas da4 at smartpqi0 bus 0 scbus2 target 69 lun 0 Mar 15 16:19:44 nas da4: <ATA MB006000GWBXQ HPG8> Fixed Direct Access SPC-4 SCSI device Mar 15 16:19:44 nas da4: Serial Number 78I8K0NVF4FE Mar 15 16:19:44 nas da4: 1200.000MB/s transfers Mar 15 16:19:44 nas da4: Command Queueing enabled Mar 15 16:19:44 nas da4: 5723166MB (11721045168 512 byte sectors) Mar 15 16:19:44 nas da6 at smartpqi0 bus 0 scbus2 target 71 lun 0 Mar 15 16:19:44 nas da6: <ATA MB006000GWBXQ HPG8> Fixed Direct Access SPC-4 SCSI device Mar 15 16:19:44 nas da6: Serial Number 78I8K0NYF4FE Mar 15 16:19:44 nas da6: 1200.000MB/s transfers Mar 15 16:19:44 nas da6: Command Queueing enabled Mar 15 16:19:44 nas da6: 5723166MB (11721045168 512 byte sectors) Mar 15 16:19:44 nas da8 at smartpqi0 bus 0 scbus2 target 73 lun 0 Mar 15 16:19:44 nas da8: <ATA MB006000GWBXQ HPG8> Fixed Direct Access SPC-4 SCSI device Mar 15 16:19:44 nas da8: Serial Number 78I8K0NPF4FE Mar 15 16:19:44 nas da8: 1200.000MB/s transfers Mar 15 16:19:44 nas da8: Command Queueing enabled Mar 15 16:19:44 nas da8: 5723166MB (11721045168 512 byte sectors) Mar 15 16:19:44 nas da7 at smartpqi0 bus 0 scbus2 target 72 lun 0 Mar 15 16:19:44 nas da7: <ATA MB006000GWBXQ HPG8> Fixed Direct Access SPC-4 SCSI device Mar 15 16:19:44 nas da7: Serial Number 78I8K0NUF4FE Mar 15 16:19:44 nas da7: 1200.000MB/s transfers Mar 15 16:19:44 nas da7: Command Queueing enabled Mar 15 16:19:44 nas da7: 5723166MB (11721045168 512 byte sectors) Mar 15 16:19:44 nas da10 at smartpqi0 bus 0 sc Mar 15 16:19:44 nas K0NUF4FE Mar 15 16:19:44 nas da7: 1200.000MB/s transfers Mar 15 16:19:44 nas da7: Command Queueing enabled Mar 15 16:19:44 nas da7: 5723166MB (11721045168 512 byte sectors) Mar 15 16:19:44 nas da10 at smartpqi0 bus 0 sc Mar 15 16:19:44 nas Updating CPU Microcode... Mar 15 16:19:45 nas Done. Mar 15 16:19:45 nas Starting watchdogd. Mar 15 16:19:45 nas Starting mountd. Mar 15 16:19:45 nas mountd: mountd already running, pid: 2786. Mar 15 16:19:45 nas /etc/rc: WARNING: failed to start mountd Mar 15 16:19:46 nas NFSv4 is disabled Mar 15 16:19:46 nas Starting nfsd. Mar 15 16:19:45 nas 1 2021-03-15T16:19:45.902805+01:00 nas.domain.local nfsd 3234 - - Can't read stable storage file: Operation not permitted Mar 15 16:19:46 nas Starting statd. Mar 15 16:19:46 nas Starting lockd. Mar 15 16:19:46 nas Updating motd:. Mar 15 16:19:46 nas Mounting late filesystems:. Mar 15 16:19:46 nas Starting ntpd. Mar 15 16:19:46 nas kernel: NLM: can't start server - it appears to be running already Mar 15 16:19:46 nas kernel[3015]: Last message 'NLM: can't start ser' repeated 1 times, suppressed by syslog-ng on nas.domain.local Mar 15 16:19:46 nas 1 2021-03-15T16:19:46.109431+01:00 nas.domain.local su 3283 - - pam_winbind(su): PAM_ESTABLISH_CRED not implemented Mar 15 16:19:46 nas Segmentation fault Mar 15 16:19:46 nas /etc/rc: WARNING: failed to start ntpd Mar 15 16:19:46 nas 1 2021-03-15T16:19:46.157300+01:00 nas.domain.local su 3283 - - pam_winbind(su): request wbcLogoffUser failed: WBC_ERR_WINBIND_NOT_AVAILABLE, PAM error: PAM_AUTHINFO_UNAVAIL (12)! Mar 15 16:19:46 nas 1[3015]: Last message '2021-03-15T16:19:46.' repeated 1 times, suppressed by syslog-ng on nas.domain.local Mar 15 16:19:46 nas 1 2021-03-15T16:19:46.157313+01:00 nas.domain.local su 3283 - - pam_winbind(su): internal module error (retval = PAM_AUTHINFO_UNAVAIL(12), user = 'ntpd') Mar 15 16:19:46 nas 1[3015]: Last message '2021-03-15T16:19:46.' repeated 1 times, suppressed by syslog-ng on nas.domain.local Mar 15 16:19:46 nas 1 2021-03-15T16:19:46.157340+01:00 nas.domain.local su 3283 - - pam_winbind(su): request wbcLogoffUser failed: WBC_ERR_WINBIND_NOT_AVAILABLE, PAM error: PAM_AUTHINFO_UNAVAIL (12)! Mar 15 16:19:46 nas 1[3015]: Last message '2021-03-15T16:19:46.' repeated 1 times, suppressed by syslog-ng on nas.domain.local Mar 15 16:19:46 nas 1 2021-03-15T16:19:46.157351+01:00 nas.domain.local su 3283 - - pam_winbind(su): internal module error (retval = PAM_AUTHINFO_UNAVAIL(12), user = 'ntpd') Mar 15 16:19:46 nas Updating /var/run/os-release done. Mar 15 16:19:46 nas Cannot 'start' nmbd. Set nmbd_enable to YES in /etc/rc.conf or use 'onestart' instead of 'start'. Mar 15 16:19:46 nas Starting smbd. Mar 15 16:19:46 nas Starting winbindd. Mar 15 16:19:46 nas Starting[3015]: Last message 'winbindd.' repeated 1 times, suppressed by syslog-ng on nas.domain.local Mar 15 16:19:46 nas Starting smbd. Mar 15 16:19:47 nas starting wsdd Mar 15 16:19:47 nas Starting snmpd. Mar 15 16:19:47 nas Starting snmp_agent. Mar 15 16:19:47 nas Starting smartd_daemon. Mar 15 16:19:47 nas Starting rrdcached. Mar 15 16:19:47 nas Performing sanity check on openssh configuration. Mar 15 16:19:47 nas Starting openssh. Mar 15 16:19:47 nas Performing sanity check on nginx configuration: Mar 15 16:19:47 nas nginx: the configuration file /usr/local/etc/nginx/nginx.conf syntax is ok Mar 15 16:19:47 nas nginx: configuration file /usr/local/etc/nginx/nginx.conf test is successful Mar 15 16:19:47 nas Starting nginx. Mar 15 16:19:47 nas Starting dbus. Mar 15 16:19:47 nas Starting collectd_daemon. Mar 15 16:19:47 nas Starting avahi-daemon. Mar 15 16:19:48 nas Syncing multipaths... Mar 15 16:19:54 nas [EFAULT] kinit for domain [domain.local] with principal [nas$@domain.local] failed: kinit: krb5_get_init_creds: unable to reach any KDC in realm domain.local Mar 15 16:19:54 nas Mar 15 16:19:54 nas Traceback (most recent call last): Mar 15 16:19:54 nas File "/usr/local/lib/python3.8/site-packages/middlewared/main.py", line 137, in call_method Mar 15 16:19:54 nas result = await self.middleware._call(message['method'], serviceobj, methodobj, params, app=self, Mar 15 16:19:54 nas File "/usr/local/lib/python3.8/site-packages/middlewared/main.py", line 1195, in _call Mar 15 16:19:54 nas return await methodobj(*prepared_call.args) Mar 15 16:19:54 nas File "/usr/local/lib/python3.8/site-packages/middlewared/plugins/kerberos.py", line 577, in start Mar 15 16:19:54 nas await asyncio.wait_for(self._kinit(), timeout=kinit_timeout) Mar 15 16:19:54 nas File "/usr/local/lib/python3.8/asyncio/tasks.py", line 483, in wait_for Mar 15 16:19:54 nas return fut.result() Mar 15 16:19:54 nas File "/usr/local/lib/python3.8/site-packages/middlewared/plugins/kerberos.py", line 318, in _kinit Mar 15 16:19:54 nas await self.do_kinit(ad) Mar 15 16:19:54 nas File "/usr/local/lib/python3.8/site-packages/middlewared/plugins/kerberos.py", line 269, in do_kinit Mar 15 16:19:54 nas raise CallError(f"kinit for domain [{data['domainname']}] " Mar 15 16:19:54 nas middlewared.service_exception.CallError: [EFAULT] kinit for domain [domain.local] with principal [nas$@domain.local] failed: kinit: krb5_get_init_creds: unable to reach any KDC in realm domain.local Mar 15 16:19:54 nas Mar 15 16:19:54 nas Mar 15 16:19:54 nas Configuring vt: blanktime. Mar 15 16:19:54 nas Starting cron. Mar 15 16:19:55 nas net.inet.carp.allow: 0 -> 1 Mar 15 16:19:55 nas Mar 15 16:19:55 nas Mon Mar 15 16:19:55 CET 2021 Mar 15 16:20:50 nas 1 2021-03-15T16:20:50.857972+01:00 nas.domain.local login 3659 - - pam_winbind(login): PAM_REINITIALIZE_CRED not implemented Mar 15 16:21:13 nas 1 2021-03-15T16:21:13.778960+01:00 nas.domain.local login 3659 - - pam_winbind(login): user 'root' OK Mar 15 16:22:30 nas 1 2021-03-15T16:22:30.731482+01:00 nas.domain.local login 3827 - - pam_winbind(login): PAM_REINITIALIZE_CRED not implemented Mar 15 16:23:36 nas 1 2021-03-15T16:23:36.383317+01:00 nas.domain.local login 3870 - - pam_winbind(login): PAM_REINITIALIZE_CRED not implemented Mar 15 16:23:52 nas 1 2021-03-15T16:23:52.668306+01:00 nas.domain.local login 3870 - - pam_winbind(login): user 'root' OK Mar 15 16:27:05 nas 1 2021-03-15T16:27:05.773788+01:00 nas.domain.local login 4002 - - pam_winbind(login): PAM_REINITIALIZE_CRED not implemented Mar 15 16:27:18 nas 1 2021-03-15T16:27:18.298800+01:00 nas.domain.local login 4002 - - pam_winbind(login): user 'root' OK Mar 15 16:27:18 nas 1 2021-03-15T16:27:18.444081+01:00 nas.domain.local login 4011 - - pam_winbind(login): PAM_REINITIALIZE_CRED not implemented Mar 15 16:29:58 nas 1 2021-03-15T16:29:58.173970+01:00 nas.domain.local login 4094 - - pam_winbind(login): PAM_REINITIALIZE_CRED not implemented Mar 16 10:49:45 nas Starting devd. Mar 16 10:49:45 nas Starting ums0 moused. Mar 16 10:49:45 nas Starting zfsd.