WRITE_DMA48 + CAM status: Command timeout (drives unavailable)

Status
Not open for further replies.

Stason

Cadet
Joined
Nov 3, 2016
Messages
6
Hello everyone.
Just installed FreeNas a couple of days ago and I am experiencing an issue that I can't resolve.
I did search the forums, but was unable to find an answer to the problem.

First off, the system is:
Code:
FreeNAS-9.10.1-U2
Motherboard EVGA nVidia nForce 680i SLI
Intel Core2 Duo E6750 @ 2.66GHz
8GB RAM
8GB USB Stick
3 x WD Red 3TB drives (connected to SATA0, SATA1, SATA2)
Network Card - integrated NVIDIA nForce MCP55
EVGA 650W PSU


I installed FreeNas from a USB stick to another USB stick, booted and formatted the drives.
I then setup a user, setup a Windows share and copied some data to FreeNas. Everything worked so far.
I also assigned a static IP to FreeNas, disabled DHCP and setup SMTP.

I have some experience with Linux, but would really appreciate if someone can chime in.

When the box boots, I see no errors, the IP is assigned and FreeNas console menu becomes visible.
As soon as the menu is on the screen, these messages start popping up:
Nov 10 01:51:20 nas (ada0:ata2:0:0:0): WRITE_DMA48. ACB: 35 00 e0 20 40 40 58 00 00 00 08 00
Nov 10 01:51:20 nas (ada0:ata2:0:0:0): CAM status: Command timeout
Nov 10 01:51:20 nas (ada0:ata2:0:0:0): Retrying command
Nov 10 01:51:20 nas (ada1:ata3:0:0:0): WRITE_DMA48. ACB: 35 00 e0 20 40 40 58 00 00 00 08 00
Nov 10 01:51:20 nas (ada1:ata3:0:0:0): CAM status: Command timeout
Nov 10 01:51:20 nas (ada1:ata3:0:0:0): Retrying command
Nov 10 01:51:20 nas (ada2:ata4:0:0:0): WRITE_DMA48. ACB: 35 00 e0 20 40 40 58 00 00 00 08 00
Nov 10 01:51:20 nas (ada2:ata4:0:0:0): CAM status: Command timeout
Nov 10 01:51:20 nas (ada2:ata4:0:0:0): Retrying command
Nov 10 01:51:50 nas (ada0:ata2:0:0:0): WRITE_DMA48. ACB: 35 00 e0 20 40 40 58 00 00 00 08 00
Nov 10 01:51:50 nas (ada0:ata2:0:0:0): CAM status: Command timeout
Nov 10 01:51:50 nas (ada0:ata2:0:0:0): Retrying command
Nov 10 01:51:50 nas (ada1:ata3:0:0:0): WRITE_DMA48. ACB: 35 00 e0 20 40 40 58 00 00 00 08 00
Nov 10 01:51:50 nas (ada1:ata3:0:0:0): CAM status: Command timeout
Nov 10 01:51:50 nas (ada1:ata3:0:0:0): Retrying command
Nov 10 01:51:50 nas (ada2:ata4:0:0:0): WRITE_DMA48. ACB: 35 00 e0 20 40 40 58 00 00 00 08 00
Nov 10 01:51:50 nas (ada2:ata4:0:0:0): CAM status: Command timeout
Nov 10 01:51:50 nas (ada2:ata4:0:0:0): Retrying command
Nov 10 01:52:20 nas (ada0:ata2:0:0:0): WRITE_DMA48. ACB: 35 00 e0 20 40 40 58 00 00 00 08 00
Nov 10 01:52:20 nas (ada0:ata2:0:0:0): CAM status: Command timeout
Nov 10 01:52:20 nas (ada0:ata2:0:0:0): Retrying command
Nov 10 01:52:20 nas (ada1:ata3:0:0:0): WRITE_DMA48. ACB: 35 00 e0 20 40 40 58 00 00 00 08 00
Nov 10 01:52:20 nas (ada1:ata3:0:0:0): CAM status: Command timeout
Nov 10 01:52:20 nas (ada1:ata3:0:0:0): Retrying command
Nov 10 01:52:20 nas (ada2:ata4:0:0:0): WRITE_DMA48. ACB: 35 00 e0 20 40 40 58 00 00 00 08 00
Nov 10 01:52:20 nas (ada2:ata4:0:0:0): CAM status: Command timeout
Nov 10 01:52:20 nas (ada2:ata4:0:0:0): Retrying command
Nov 10 01:52:50 nas (ada0:ata2:0:0:0): WRITE_DMA48. ACB: 35 00 e0 20 40 40 58 00 00 00 08 00
Nov 10 01:52:50 nas (ada0:ata2:0:0:0): CAM status: Command timeout
Nov 10 01:52:50 nas (ada0:ata2:0:0:0): Retrying command
Nov 10 01:52:50 nas (ada1:ata3:0:0:0): WRITE_DMA48. ACB: 35 00 e0 20 40 40 58 00 00 00 08 00
Nov 10 01:52:50 nas (ada1:ata3:0:0:0): CAM status: Command timeout
Nov 10 01:52:50 nas (ada1:ata3:0:0:0): Retrying command
Nov 10 01:52:50 nas (ada2:ata4:0:0:0): WRITE_DMA48. ACB: 35 00 e0 20 40 40 58 00 00 00 08 00
Nov 10 01:52:50 nas (ada2:ata4:0:0:0): CAM status: Command timeout
Nov 10 01:52:50 nas (ada2:ata4:0:0:0): Retrying command
Nov 10 01:53:20 nas (ada0:ata2:0:0:0): WRITE_DMA48. ACB: 35 00 e0 20 40 40 58 00 00 00 08 00
Nov 10 01:53:20 nas (ada0:ata2:0:0:0): CAM status: Command timeout
Nov 10 01:53:20 nas (ada0:ata2:0:0:0): Error 5, Retries exhausted
Nov 10 01:53:21 nas (ada2:ata4:0:0:0): WRITE_DMA48. ACB: 35 00 e0 20 40 40 58 00 00 00 08 00
Nov 10 01:53:21 nas (ada2:ata4:0:0:0): CAM status: Command timeout
Nov 10 01:53:21 nas (ada2:ata4:0:0:0): Error 5, Retries exhausted
Nov 10 01:53:51 nas (ada2:ata4:0:0:0): WRITE_DMA48. ACB: 35 00 e8 20 40 40 58 00 00 00 10 00
Nov 10 01:53:51 nas (ada2:ata4:0:0:0): CAM status: Command timeout
Nov 10 01:53:51 nas (ada2:ata4:0:0:0): Retrying command
Nov 10 01:54:21 nas (ada2:ata4:0:0:0): WRITE_DMA48. ACB: 35 00 e8 20 40 40 58 00 00 00 10 00
Nov 10 01:54:21 nas (ada2:ata4:0:0:0): CAM status: Command timeout
Nov 10 01:54:21 nas (ada2:ata4:0:0:0): Retrying command
Nov 10 01:54:51 nas (ada2:ata4:0:0:0): WRITE_DMA48. ACB: 35 00 e8 20 40 40 58 00 00 00 10 00
Nov 10 01:54:51 nas (ada2:ata4:0:0:0): CAM status: Command timeout
Nov 10 01:54:51 nas (ada2:ata4:0:0:0): Retrying command
Nov 10 01:55:22 nas (ada2:ata4:0:0:0): WRITE_DMA48. ACB: 35 00 e8 20 40 40 58 00 00 00 10 00
Nov 10 01:55:22 nas (ada2:ata4:0:0:0): CAM status: Command timeout
Nov 10 01:55:22 nas (ada2:ata4:0:0:0): Retrying command
Nov 10 01:55:52 nas (ada2:ata4:0:0:0): WRITE_DMA48. ACB: 35 00 e8 20 40 40 58 00 00 00 10 00
Nov 10 01:55:52 nas (ada2:ata4:0:0:0): CAM status: Command timeout
Nov 10 01:55:52 nas (ada2:ata4:0:0:0): Error 5, Retries exhausted


Now, until I see third "Error 5, Retries exhausted" message, the WebGIU and shared storage are both unavailable (can still connect to the box through SSH though).
When third "Error 5, Retries exhausted" message pops out, everything starts working like a charm.

After some time (about 30 minutes), same thing again:

Nov 10 02:21:17 nas (ada1:ata3:0:0:0): WRITE_DMA48. ACB: 35 00 28 52 40 40 58 00 00 00 10 00
Nov 10 02:21:17 nas (ada1:ata3:0:0:0): CAM status: Command timeout
Nov 10 02:21:17 nas (ada1:ata3:0:0:0): Retrying command
Nov 10 02:21:17 nas (ada2:ata4:0:0:0): WRITE_DMA48. ACB: 35 00 28 52 40 40 58 00 00 00 10 00
Nov 10 02:21:17 nas (ada2:ata4:0:0:0): CAM status: Command timeout
Nov 10 02:21:17 nas (ada2:ata4:0:0:0): Retrying command
Nov 10 02:21:17 nas (ada0:ata2:0:0:0): WRITE_DMA48. ACB: 35 00 20 52 40 40 58 00 00 00 10 00
Nov 10 02:21:17 nas (ada0:ata2:0:0:0): CAM status: Command timeout
etc..


Now during this, the WebGIU is fully functional, but I can't access my shared storage.

As you can tell, this has something to do with the hard drives (cables? SATA controllers?), but not sure what exactly. The drives are all brand new. Maybe I forgot to set something up during install? Although I am sure I followed instructions well. Anything specific in BIOS should be adjusted with regards to hard drives?
I also think it might have something to do with smartd. The disks are checked during the bootup and then every 30 minutes, correct? Don't want to disable S.M.A.R.T. for sure though.

Also, there was one time when I booted the machine and these messages never poped up. Everything worked with no hikkups for about 50 hrs. Until the reboot.

Any info would be really appreciated, guys. Thank you in advance!
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
I believe the nvidia boards don't use intel controllers. Correct me if I'm wrong.

I'd suspect the controllers.

You could add an LSI HBA...
 

Stason

Cadet
Joined
Nov 3, 2016
Messages
6
Thanks for your reply.

The controllers are nvidia nforce mcp55. I did some more research and found out that nforce controllers might cause issues in some FreeNAS configs, nothing like the one I am having though. On top of that, plenty of people with my controllers do not experience any problems.

I also found that MCP55 is fully supported by FreeBSD 10.3: https://www.freebsd.org/releases/10.3R/hardware.html#DISK

I couldn't find any information in the official documentation, stating that only Intel controllers should be used: http://doc.freenas.org/9.10/intro.html#storage-disks-and-controllers

Can someone tell me what these messages mean?
nas (ada0:ata2:0:0:0): WRITE_DMA48. ACB: 35 00 e0 20 40 40 58 00 00 00 08 00
nas (ada0:ata2:0:0:0): CAM status: Command timeout


I Googled, but couldn't find anything relevant.

Thank you!
 
Last edited:

Stason

Cadet
Joined
Nov 3, 2016
Messages
6
Small update. I disabled smartd, no errors at all right now.
I don't want to leave smartd disabled though. Can someone chime in if I can just disable some setting in smartd, in order to avoid these errors, without sacrificing essential SMART features?
Thanks in advance!
 

darkwarrior

Patron
Joined
Mar 29, 2015
Messages
336
Mate, don't do that.
smartd is not the cause of the errors, it's just reporting them for you to get notified. Under no circumstance you want to keep the deamon disabled...

At some point I had similar errors on 1 drive and ended up throwing it out.
But in your case you're having errors on all the 3 drives in your system.
You should check the SATA and power cables and check the powersupply.
Like @Stux said: It might also be related to the controller used on the motherboard ...
 

Stason

Cadet
Joined
Nov 3, 2016
Messages
6
Thanks for your reply.
I definitely don't want to leave smartd disabled, it was just a part of my troubleshooting process.
Cables are all brand new, power supply is new as well. I am now certain that the culprit is in the controllers, but I just want to know what is actually happening with them, what is the issue.
I don't want to go and buy a new motherboard with another controller only to experience the same problem.
 

darkwarrior

Patron
Joined
Mar 29, 2015
Messages
336
You could try to see if you can buy/borrow temporarily a HBA like the IBM M1015 and connect the disks onto that new controller.
That way you can make sure that "our" assumptions are right and gain a real HBA in the process, which you could keep to connect all the disks or for future expansion in a future "Server grade" build, that will definetly serve you better (on the long term) :p
 

Stason

Cadet
Joined
Nov 3, 2016
Messages
6
Thank you for recommendation.
Just purchased an LSI HBA off of ebay (IBM ServeRaid M1015 46M0861 SAS/SATA PCI-e RAID Controller LSI SAS9220-8i) along with mini SAS to 4xSATA cable.
Will report back as soon as it arrives and is installed.
 

Jonnhy

Dabbler
Joined
Jan 19, 2017
Messages
32
Hi there, I have been having very similar if not the exact same problem. Did you find a solution? What was it?

Thanks in advance
 

Stason

Cadet
Joined
Nov 3, 2016
Messages
6
Hello,

I just went ahead and installed LSI HBA (cost me $65 US shipped to Canada) - http://www.ebay.ca/itm/252307047041
Had to re-flash it to IT mode, connected all hard drives to it, and then all errors were gone.
Had an uptime of about one month with no timeouts and nothing weird in the error log.

Hope it helps.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Careful buying them from China. Fakes abound.
 

Jonnhy

Dabbler
Joined
Jan 19, 2017
Messages
32
Well I have bought one from China about a month ago, is there any easy way of verifying its authenticity?
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
I'm not an expert on the matter, but LSI's support apparently helps with that.
 
Status
Not open for further replies.
Top