FreeNAS-11.3-RELEASE Randomly Rebooting

Reece Johnson

Dabbler
Joined
Dec 2, 2015
Messages
13
Hey Everyone,

I have a FreeNAS system (details in signature) that has been acting up on me. I had 11.2 installed on it and it was very stable, went months without a reboot. But after upgrading to 11.3 I've been having issues.

I host a combination of VMs (running on a separate ESXi box) and other storage (security camera footage, PC rsync backups, etc) all going to this FreeNAS system. It's been configured that way for years, so no changes there. But after the upgrade to 11.3, every 2-3 weeks or I'll find that all my VMs have locked up. I check the uptime on my FreeNAS box and sure enough, it'll be a couple hours. The first time I thought it was a fluke, but now it's happened 3 times. I check another box I have on the same UPS and that one will have an uptime of months, so I don't think it's a power issue. This most recently happened today. I started trying to go through the logs to get an idea of what happened. Here are some snippets:

/var/log/messages:

Code:
Feb 25 00:00:00 freenas syslog-ng[1161]: Configuration reload request received, reloading configuration;
Feb 25 00:00:00 freenas syslog-ng[1161]: Configuration reload finished;
Feb 26 00:00:00 freenas syslog-ng[1161]: Configuration reload request received, reloading configuration;
Feb 26 00:00:00 freenas syslog-ng[1161]: Configuration reload finished;
Feb 27 00:00:00 freenas syslog-ng[1161]: Configuration reload request received, reloading configuration;
Feb 27 00:00:00 freenas syslog-ng[1161]: Configuration reload finished;
Feb 27 11:13:27 freenas     (pass2:mpr0:0:4:0): LOG SENSE. CDB: 4d 00 0d 00 00 00 00 00 40 00 length 64 SMID 511 Aborting command 0xfffffe0001258e90
Feb 27 11:13:27 freenas mpr0: Sending reset from mprsas_send_abort for target ID 4
Feb 27 11:13:27 freenas     (pass0:mpr0:0:2:0): LOG SENSE. CDB: 4d 00 0d 00 00 00 00 00 40 00 length 64 SMID 288 Aborting command 0xfffffe0001244e00
Feb 27 11:13:27 freenas mpr0: Sending reset from mprsas_send_abort for target ID 2
Feb 27 11:13:27 freenas     (pass3:mpr0:0:5:0): LOG SENSE. CDB: 4d 00 0d 00 00 00 00 00 40 00 length 64 SMID 922 Aborting command 0xfffffe000127dd60
Feb 27 11:13:27 freenas mpr0: Sending reset from mprsas_send_abort for target ID 5
Feb 27 11:13:27 freenas     (pass1:mpr0:0:3:0): LOG SENSE. CDB: 4d 00 2f 00 00 00 00 00 40 00 length 64 SMID 662 Aborting command 0xfffffe00012667a0
Feb 27 11:13:27 freenas mpr0: Sending reset from mprsas_send_abort for target ID 3
Feb 27 11:13:28 freenas mpr0: mprsas_action_scsiio: Freezing devq for target ID 3
Feb 27 11:13:28 freenas mpr0: mprsas_action_scsiio: Freezing devq for target ID 5
Feb 27 11:13:28 freenas (da1:mpr0:0:3:0): READ(10). CDB: 28 00 18 60 66 38 00 00 30 00
Feb 27 11:13:28 freenas (da3:mpr0:0:5:0): READ(10). CDB: 28 00 18 60 66 30 00 00 30 00
Feb 27 11:13:28 freenas (da1:mpr0:0:3:0): CAM status: CAM subsystem is busy
Feb 27 11:13:28 freenas (da3:mpr0:0:5:0): CAM status: CAM subsystem is busy
Feb 27 11:13:28 freenas (da1:mpr0:0:3:0): Retrying command
Feb 27 11:13:28 freenas (da3:mpr0:0:5:0): Retrying command
Feb 27 11:13:28 freenas mpr0: mprsas_action_scsiio: Freezing devq for target ID 4
Feb 27 11:13:28 freenas (da2:mpr0:0:4:0): READ(10). CDB: 28 00 6b 31 e4 30 00 00 08 00
Feb 27 11:13:28 freenas (da2:mpr0:0:4:0): CAM status: CAM subsystem is busy
Feb 27 11:13:28 freenas (da2:mpr0:0:4:0): Retrying command
Feb 27 11:13:29 freenas mpr0: mprsas_action_scsiio: Freezing devq for target ID 5
Feb 27 11:13:29 freenas mpr0: mprsas_action_scsiio: Freezing devq for target ID 3
Feb 27 11:13:29 freenas (da1:mpr0:0:3:0): READ(10). CDB: 28 00 18 60 66 38 00 00 30 00
Feb 27 11:13:29 freenas (da3:mpr0:0:5:0): READ(10). CDB: 28 00 18 60 66 30 00 00 30 00
Feb 27 11:13:29 freenas (da1:mpr0:0:3:0): CAM status: CAM subsystem is busy
Feb 27 11:13:29 freenas (da1:mpr0:0:3:0): Retrying command
Feb 27 11:13:29 freenas mpr0: (da3:mpr0:0:5:0): CAM status: CAM subsystem is busy
Feb 27 11:13:29 freenas (da3:mpr0:0:5:0): Retrying command
Feb 27 11:13:29 freenas mprsas_action_scsiio: Freezing devq for target ID 4
Feb 27 11:13:29 freenas (da2:mpr0:0:4:0): READ(10). CDB: 28 00 6b 31 e4 30 00 00 08 00
Feb 27 11:13:29 freenas (da2:mpr0:0:4:0): CAM status: CAM subsystem is busy
Feb 27 11:13:29 freenas (da2:mpr0:0:4:0): Retrying command
Feb 27 11:13:30 freenas mpr0: mprsas_action_scsiio: Freezing devq for target ID 4
Feb 27 11:13:30 freenas mpr0: mprsas_action_scsiio: Freezing devq for target ID 5
Feb 27 11:13:30 freenas mpr0: mprsas_action_scsiio: Freezing devq for target ID 3
Feb 27 11:13:30 freenas (da2:mpr0:0:4:0): READ(10). CDB: 28 00 6b 31 e4 30 00 00 08 00
Feb 27 11:13:30 freenas (da1:mpr0:0:3:0): READ(10). CDB: 28 00 18 60 66 38 00 00 30 00
Feb 27 11:13:30 freenas (da2:mpr0:0:4:0): CAM status: CAM subsystem is busy
Feb 27 11:13:30 freenas (da1:mpr0:0:3:0): CAM status: CAM subsystem is busy
Feb 27 11:13:30 freenas (da3:mpr0:0:5:0): READ(10). CDB: 28 00 18 60 66 30 00 00 30 00
Feb 27 11:13:30 freenas (da2:(da3:mpr0:0:5:0): CAM status: CAM subsystem is busy
Feb 27 11:13:30 freenas mpr0:0:(da3:mpr0:0:5:0): Retrying command
Feb 27 11:13:30 freenas 4:0): Retrying command
Feb 27 11:13:30 freenas (da1:mpr0:0:3:0): Retrying command
Feb 27 11:13:30 freenas mpr0: mprsas_action_scsiio: Freezing devq for target ID 3
Feb 27 11:13:30 freenas mpr0: mprsas_action_scsiio: Freezing devq for target ID 4
Feb 27 11:13:30 freenas mpr0: mprsas_action_scsiio: Freezing devq for target ID 5
Feb 27 11:13:30 freenas (da1:mpr0:0:3:0): READ(10). CDB: 28 00 18 60 66 38 00 00 30 00
Feb 27 11:13:30 freenas (da2:mpr0:0:4:0): READ(10). CDB: 28 00 6b 31 e4 30 00 00 08 00
Feb 27 11:13:30 freenas (da1:mpr0:0:3:0): CAM status: CAM subsystem is busy
Feb 27 11:13:30 freenas (da2:mpr0:0:4:0): CAM status: CAM subsystem is busy
Feb 27 11:13:30 freenas (da3:mpr0:0:5:0): READ(10). CDB: 28 00 18 60 66 30 00 00 30 00
Feb 27 11:13:30 freenas (da1:(da3:mpr0:0:5:0): CAM status: CAM subsystem is busy
Feb 27 11:13:30 freenas mpr0:0:(da3:mpr0:0:5:0): Retrying command
Feb 27 11:13:30 freenas 3:0): Retrying command
Feb 27 11:13:30 freenas (da2:mpr0:0:4:0): Retrying command
Feb 27 11:13:31 freenas mpr0: mprsas_action_scsiio: Freezing devq for target ID 4
Feb 27 11:13:31 freenas mpr0: mprsas_action_scsiio: Freezing devq for target ID 3
Feb 27 11:13:31 freenas mpr0: mprsas_action_scsiio: Freezing devq for target ID 5
Feb 27 11:13:31 freenas (da1:mpr0:0:3:0): READ(10). CDB: 28 00 18 60 66 38 00 00 30 00
Feb 27 11:13:31 freenas (da2:mpr0:0:4:0): READ(10). CDB: 28 00 6b 31 e4 30 00 00 08 00
Feb 27 11:13:31 freenas (da3:mpr0:0:5:0): READ(10). CDB: 28 00 18 60 66 30 00 00 30 00
Feb 27 11:13:31 freenas (da1:mpr0:0:3:0): CAM status: CAM subsystem is busy
Feb 27 11:13:31 freenas (da3:mpr0:0:5:0): CAM status: CAM subsystem is busy
Feb 27 11:13:31 freenas (da3:mpr0:0:5:0): Error 5, Retries exhausted
Feb 27 11:13:31 freenas mpr0: mprsas_action_scsiio: Freezing devq for target ID 5
Feb 27 11:13:31 freenas (da2:mpr0:0:4:0): CAM status: CAM subsystem is busy
Feb 27 11:13:31 freenas (da3:mpr0:0:5:0): READ(10). CDB: 28 00 6b 31 e4 30 00 00 08 00
Feb 27 11:13:31 freenas (da2:mpr0:0:4:0): Error 5, Retries exhausted
Feb 27 11:13:31 freenas (da1:mpr0:0:3:0): Error 5, Retries exhausted
Feb 27 11:13:31 freenas (da3:mpr0:0:5:0): CAM status: CAM subsystem is busy
Feb 27 11:13:31 freenas mpr0: (da3:mprsas_action_scsiio: Freezing devq for target ID 4
Feb 27 11:13:31 freenas mpr0:0:5:0): Retrying command
Feb 27 11:13:31 freenas mpr0: mprsas_action_scsiio: Freezing devq for target ID 3
Feb 27 11:13:31 freenas (da2:mpr0:0:4:0): READ(10). CDB: 28 00 00 40 02 90 00 00 10 00
Feb 27 11:13:31 freenas (da2:mpr0:0:4:0): CAM status: CAM subsystem is busy
Feb 27 11:13:31 freenas (da2:mpr0:0:4:0): Retrying command
Feb 27 11:13:31 freenas (da1:mpr0:0:3:0): READ(10). CDB: 28 00 00 40 02 90 00 00 10 00
Feb 27 11:13:31 freenas (da1:mpr0:0:3:0): CAM status: CAM subsystem is busy
Feb 27 11:13:31 freenas (da1:mpr0:0:3:0): Retrying command
Feb 27 11:13:31 freenas mpr0: mprsas_action_scsiio: Freezing devq for target ID 3
Feb 27 11:13:31 freenas mpr0: mprsas_action_scsiio: Freezing devq for target ID 4
Feb 27 11:13:31 freenas mpr0: mprsas_action_scsiio: Freezing devq for target ID 5
Feb 27 11:13:31 freenas (da2:mpr0:0:4:0): READ(10). CDB: 28 00 00 40 02 90 00 00 10 00
Feb 27 11:13:31 freenas (da1:mpr0:0:3:0): READ(10). CDB: 28 00 00 40 02 90 00 00 10 00
Feb 27 11:13:31 freenas (da2:mpr0:0:4:0): CAM status: CAM subsystem is busy
Feb 27 11:13:31 freenas (da2:mpr0:0:4:0): Retrying command
Feb 27 11:13:31 freenas (da3:mpr0:0:5:0): READ(10). CDB: 28 00 6b 31 e4 30 00 00 08 00
Feb 27 11:13:31 freenas (da3:mpr0:0:5:0): CAM status: CAM subsystem is busy
Feb 27 11:13:31 freenas (da3:mpr0:0:5:0): Retrying command
Feb 27 11:13:31 freenas (da1:mpr0:0:3:0): CAM status: CAM subsystem is busy
Feb 27 11:13:31 freenas (da1:mpr0:0:3:0): Retrying command
Feb 27 11:21:43 freenas syslog-ng[1161]: syslog-ng starting up; version='3.23.1'
Feb 27 11:21:43 freenas Copyright (c) 1992-2019 The FreeBSD Project.
Feb 27 11:21:43 freenas Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
Feb 27 11:21:43 freenas     The Regents of the University of California. All rights reserved.
Feb 27 11:21:43 freenas FreeBSD is a registered trademark of The FreeBSD Foundation.
Feb 27 11:21:43 freenas FreeBSD 11.3-RELEASE-p5 #0 r325575+8ed1cd24b60(HEAD): Mon Jan 27 18:07:23 UTC 2020


I haven't been able to find too much more in the logs besides that. I'm struggling to figure out what's going on, and could use some help.

The main thing I'm doing differently on this, compared to 11.2, is that I added my VM environment to be backed up to Backblaze B2. I've been using B2 with FreeNAS, but mainly with small files and non-VM data. I was waiting for 11.3 to back up my VMs so I could snapshot them, then exclude the delta VMDK's to prevent it from copying forever as the disk images updated.

As part of that, I did notice that some of them seemed to keep copying. I managed to save this log before it rebooted:

Code:
root@freenas:/var/tmp/middlewared/jobs # cat 4544.log | tail
Errors:                 0
Checks:               221 / 221, 100%
Transferred:           95 / 114, 83%
Elapsed time:  228h54m59.8s
Transferring:
 *                             FMC/FMC-flat.vmdk:110% /250G, 372.582k/s, -
 *                       Apache/Apache-flat.vmdk:137% /200G, 348.867k/s, -
 *                       VIRL_1/VIRL_1-flat.vmdk:108% /250G, 370.296k/s, -
 *     Windows 10 Pro/Windows 10 Pro_0-flat.vmdk: 75% /250G, 343.794k/s, 52h13m47s


I looked this up, and apparently it's been observed with rclone on other environments, so I don't think this is a FreeNAS issue. I don't know if this could be contributing, but figured I would mention it if somehow I'm running into some issue related to that.

I appreciate any help. Not too sure what other information to provide, so I could definitely use some guidance.
 

MikeyG

Patron
Joined
Dec 8, 2017
Messages
442
What HBA are you using?
 

Reece Johnson

Dabbler
Joined
Dec 2, 2015
Messages
13
I'm using the Broadcom 3008 built into the Supermicro X10SRH-CLN4F. No additional HBA.
 

Reece Johnson

Dabbler
Joined
Dec 2, 2015
Messages
13
That looks to be very similar to what I'm seeing, including my logged messages before the reboot. Thanks for the reference! I'm going to keep an eye on https://jira.ixsystems.com/browse/NAS-105053 to see how that progresses.

I was thinking that somehow me trying to rclone 200+ GB files may have been causing the issue. I really appreciate the information!
 

MikeyG

Patron
Joined
Dec 8, 2017
Messages
442
On the last page towards the end of that thread you can see we've been experimenting with different SMART settings as a workaround for now. Might want to try that too.
 

Reece Johnson

Dabbler
Joined
Dec 2, 2015
Messages
13
I saw that! Thank you for the information. As a test I've disabled SMART on my drives. I'll keep an eye on that thread, and combine my additional troubleshooting steps into that thread to keep all the information together. Thank you for your quick help!
 
Top