Something is Wrong and I don't know what to do

Status
Not open for further replies.

ExistNY

Cadet
Joined
Dec 7, 2013
Messages
9
Hi Everyone,

I'm a computer "enthusiast" but very much a novice when it comes to NAS, servers and most other things relating to computers. Which is to say, I can build a computer, and even successfully set up a FreeNAS server that ran fine for close to three years, but now nothing is working and I don't have the slightest idea what the issue is or how to begin troubleshooting. This is my system:

SuperMicro X10-SLH-F
32Gb ECC RAM
Xeon 1225v3
SeaSonic SS-400FL
256gb SSD cache drive
4 Seagate 3tb drives running in RAID (I think 5, but it was three years ago -- definitely not 0).

I built the server in 2013, and it worked fine until this past summer. When I restart it, the SuperMicro flash screen works as usual. I checked the led lights on the motherboard and beep alerts against the SuperMicro manual, and everything seems to check out fine. I get a S.M.A.R.T. message about one of the seagate drives. After a whole bunch of largely incomprehensible (to me) text, I get the following, which is also incomprehensible to me but I can understand in the same way I understand foreign cuss words (which is to say, I don't know what it's saying, but it's clearly unhappy):

CAM status: ATA Status Error
ATA status: 41 (DRDY ERR), error: 40 (UNC )
RES: a bunch of numbers and letter (let me know if needed to diagnose)
Retrying Command

It then repeats this for a while.

Any help would be greatly appreciated!
 

Mirfster

Doesn't know what he's talking about
Joined
Oct 2, 2015
Messages
3,215

ExistNY

Cadet
Joined
Dec 7, 2013
Messages
9
If possible, can you post output of the following (in CODE Tags please)?:
dmesg
camcontrol devlist
zpool status
Thanks for the quick response. How do I run these, and how do I post them in CODE Tags? (Told you I was a novice)
 

ExistNY

Cadet
Joined
Dec 7, 2013
Messages
9
UPDATE: I managed to connect to my server via the GUI web interface (I wasn't able to do this earlier), and have the following alerts:


  • CRITICAL: Device: /dev/ada2, 1656 Currently unreadable (pending) sectors
  • CRITICAL: Device: /dev/ada2, 1656 Offline uncorrectable sectors
  • OK: There is a new update available! Apply it in System -> Update tab.
  • CRITICAL: The volume Volume1 (ZFS) state is DEGRADED: One or more devices could not be opened. Sufficient replicas exist for the pool to continue functioning in a degraded state.
  • WARNING: New feature flags are available for volume Volume1. Refer to the "Upgrading a ZFS Pool" section of the User Guide for instructions.
 

Mirfster

Doesn't know what he's talking about
Joined
Oct 2, 2015
Messages
3,215
If you are running RaidZ1 (similar to Raid 5) then you need to get another drive in there to replace that bad one ASAP. Otherwise you are looking at potential total loss of Pool.

Don't worry about upgrading yet, you need to get the Pool stabilized. That is one of the main reasons that RaidZ1 is rarely recommended (on 1 drive fault tolerance).

BTW, what version of FreeNAS are you running?
 

Mirfster

Doesn't know what he's talking about
Joined
Oct 2, 2015
Messages
3,215
Thanks for the quick response. How do I run these, and how do I post them in CODE Tags? (Told you I was a novice)
Connect via SSH (most of us use Putty); then you can enter those commands. You can easily just highlight the output and it will copy it to the clipboard. Then when you paste it into the forums (use the "code" button between "cmd" and "file").

Just grab the Putty.exe, so you don't need to install it. It is a "stand-alone" version.
https://the.earth.li/~sgtatham/putty/latest/x86/putty.exe
 
Last edited:

ExistNY

Cadet
Joined
Dec 7, 2013
Messages
9
If you are running RaidZ1 (similar to Raid 5) then you need to get another drive in there to replace that bad one ASAP. Otherwise you are looking at potential total loss of Pool.

Don't worry about upgrading yet, you need to get the Pool stabilized. That is one of the main reasons that RaidZ1 is rarely recommended (on 1 drive fault tolerance).

BTW, what version of FreeNAS are you running?
I am running FreeNAS-9.3-STABLE

How can I tell if I'm running RaidZ1? To replace the drive, anything I need to do other than shut it off, pop open the lid and replace the alleged offender? I'll order the drive ASAP.
 

Mirfster

Doesn't know what he's talking about
Joined
Oct 2, 2015
Messages
3,215
How can I tell if I'm running RaidZ1?
zpool status will tell us that, but you can also look in [Storage] - [Volumes]; Select/Highlight the top item and at the bottom click [Volume Status].
To replace the drive, anything I need to do other than shut it off, pop open the lid and replace the alleged offender?
If you have an extra SATA Port, I would suggest attaching the drive there (leaving the old on in too). Then in the GUI using the [Replace] option. Not to worry you too much, but in all reality you should of had at least a "Cold Spare" drive that was properly Tested/Burned-In. However, you are going to be "living on the edge" even with a new drive (there is "Infant Mortality" possibilities) and URE.

If you do have space somewhere else, I would highly recommend getting a backup of your data like NOW if you don't already have one. First thing first is to make sure you have vital data backed up. Later, I (and maybe others) will scold you into using at least RaidZ2 and having regular SMART Tests scheduled as well as e-mail notifications, etc... ;)

Of course, if you are running RaidZ2 then things are not as bad but still need to be addressed. :)
 

nojohnny101

Wizard
Joined
Dec 3, 2015
Messages
1,478
yep, what @Mirfster said.


- If you are indeed able the access the data on your NAS currently, I would start pulling everything off if that you deem important.
- next I would not order, but I would rush to the store and get another hard drive, much quicker than waiting.
- once you have all the data saved off the NAS that you care about, connect the new drive you just bought (after proper testing) and follow the steps in the manual to replace.
 

ExistNY

Cadet
Joined
Dec 7, 2013
Messages
9
zpool status will tell us that, but you can also look in [Storage] - [Volumes]; Select/Highlight the top item and at the bottom click [Volume Status].

If you have an extra SATA Port, I would suggest attaching the drive there (leaving the old on in too). Then in the GUI using the [Replace] option. Not to worry you too much, but in all reality you should of had at least a "Cold Spare" drive that was properly Tested/Burned-In. However, you are going to be "living on the edge" even with a new drive (there is "Infant Mortality" possibilities) and URE.

If you do have space somewhere else, I would highly recommend getting a backup of your data like NOW if you don't already have one. First thing first is to make sure you have vital data backed up. Later, I (and maybe others) will scold you into using at least RaidZ2 and having regular SMART Tests scheduled as well as e-mail notifications, etc... ;)

Of course, if you are running RaidZ2 then things are not as bad but still need to be addressed. :)


Ok, so first the good news -- Everything is backed up. I actually shut it off earlier this summer and am only now getting around to resolving this. I figured I'd need a quiet weekend -- this is the first in many weeks.

Potential additional good news, I think I'm running RaidZ2: Following the above instructions, I get "RaizZ2-0."

Bad news, apparently one of my four drives is "deattached" and the status of my volume is "degraded."

New drive has been ordered and will arrive Tuesday.

I also think I have regular S.M.A.R.T. checks scheduled (looks like one was run in June 2016), although I don't think I ever got an email notification.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Please post the requested information. It will ensure you get the most informed help possible. Also, when replacing a drive, ensure you use the drive serial number, ada2 doesn't mean it's SATA port #2, in all cases. Using Putty is the easiest way to do it but you do need to enable SSH in the FreeNAS Services.
If possible, can you post output of the following (in CODE Tags please)?:
dmesg
camcontrol devlist
zpool status
 

Mirfster

Doesn't know what he's talking about
Joined
Oct 2, 2015
Messages
3,215
Ok, so first the good news -- Everything is backed up.
Great
Potential additional good news, I think I'm running RaidZ2: Following the above instructions, I get "RaizZ2-0."
Nice, at least you are not on the cliff's edge anymore...
Bad news, apparently one of my four drives is "deattached" and the status of my volume is "degraded."
Not too bad with RaidZ2, but dependent on if the other drives are from the same time period you may have others starting to show issues. /BTW I am not a Seagate Fan so my perception may be a little skewed...
I also think I have regular S.M.A.R.T. checks scheduled (looks like one was run in June 2016), although I don't think I ever got an email notification.
Check under [Services] - "S.M.A.R.T" and ensure you have a viable address in "Email to report"
If you want to post the output of smartctl -a /dev/%drive% (Example: smartctl -a /dev/da1) then we can assist in diagnosing the status of you drives. *** Need to do it for each drive; like da0, da1, da2. If that is what they are identified as. Again, in CODE Tags if you do post them.
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
Please post the requested outputs.
 

ExistNY

Cadet
Joined
Dec 7, 2013
Messages
9
Is there anything from the various PuTTY outputs that would be totally boneheaded to post in a public forum?
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
No, nothing in the requested outputs is private. Some people mask the serial numbers of their drives for reasons I've never understood.

You'd asked about how to replace a drive. The manual has click-by-click instructions on how to do this (that's for the current version; previous versions back to 9.3 are nearly identical, and manuals for older versions yet are at doc.freenas.org). The current version has a bug where it will tell you that your replacement disk has a partition table on it even when it doesn't; you can just ignore that. Be sure to burn in and test your replacement disk before adding it to your pool.
 

ExistNY

Cadet
Joined
Dec 7, 2013
Messages
9
How do I post the DMESG result without busting through the 30,000 character minimum? Is there some portion of it I can exclude?

NOTE: I wasn't able to run "camcontrol devlist". Received error message: "camcontrol: couldn't open /dev/xpto: Permission denied

zpool status
Code:
  pool: Volume1
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
        the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://illumos.org/msg/ZFS-8000-2Q
  scan: scrub repaired 0 in 0h30m with 0 errors on Sun Sep 18 10:35:13 2016
config:

        NAME                                            STATE     READ WRITE CKSUM
        Volume1                                         DEGRADED     0     0     0
          raidz2-0                                      DEGRADED     0     0     0
            gptid/ad2823b6-1501-11e4-9f18-002590d6e374  ONLINE       0     0     0
            18332790449359223177                        REMOVED      0     0     0  was /dev/gptid/adcd0833-1501-11e4-9f18-002590d6e374
            gptid/ae799173-1501-11e4-9f18-002590d6e374  ONLINE       0     0     0
            gptid/af2823da-1501-11e4-9f18-002590d6e374  ONLINE       0     0     0
        cache
          2820864426022583963                           UNAVAIL      0     0     0  was /dev/gptid/af619dea-1501-11e4-9f18-002590d6e374

errors: No known data errors

  pool: freenas-boot
 state: ONLINE
  scan: scrub repaired 0 in 0h1m with 0 errors on Tue Jun 21 03:46:08 2016
config:

        NAME        STATE     READ WRITE CKSUM
        freenas-boot  ONLINE       0     0     0
          da0p2     ONLINE       0     0     0

errors: No known data errors

 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
I wasn't able to run "camcontrol devlist". Received error message: "camcontrol: couldn't open /dev/xpto: Permission denied
I'm pretty sure you need to be logged in as root to run that. Looks like you were logged in as someone else. Type su at the command prompt and press Enter. It will ask for the root password, which you can type in (it won't show on the screen that you're typing anything) and press Enter, then you'll be logged in as root. Try the command again.

As you'd said, your pool is in RAIDZ2, so the situation isn't critical, but you still want to replace the drive ASAP. I'd also recommend removing the cache device; it isn't helping anything, and in fact it's probably hurting performance.
 

ExistNY

Cadet
Joined
Dec 7, 2013
Messages
9
I'm pretty sure you need to be logged in as root to run that. Looks like you were logged in as someone else. Type su at the command prompt and press Enter. It will ask for the root password, which you can type in (it won't show on the screen that you're typing anything) and press Enter, then you'll be logged in as root. Try the command again.

As you'd said, your pool is in RAIDZ2, so the situation isn't critical, but you still want to replace the drive ASAP. I'd also recommend removing the cache device; it isn't helping anything, and in fact it's probably hurting performance.

Thank you for all of your help with this. Why is my cache driving hurting performance? Did I set it up wrong?
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
It's very rare for an L2ARC device to help performance on systems with under 64 GB of RAM. And since some of the RAM that might otherwise have been used for ARC must be used to index the L2ARC, you have less RAM available for caching than you would have had if you didn't have an L2ARC device.
 
Status
Not open for further replies.
Top