SCSI Failure Error

Status
Not open for further replies.

fluentd

Dabbler
Joined
Aug 20, 2013
Messages
26
Thank you so much. I would like to do some upgrading to both hardware and freenas version once all this is behind me.
 

fluentd

Dabbler
Joined
Aug 20, 2013
Messages
26
Ok the scrub is completed not errors... the drive is still faulted. What is my next step to finding which drive it is?
 

fluentd

Dabbler
Joined
Aug 20, 2013
Messages
26
Just ran the smartctl command and it gave me this

Smartctl open device: /dev/da1 failed: INQUIRY failed
 

BigDave

FreeNAS Enthusiast
Joined
Oct 6, 2013
Messages
2,479

solarisguy

Guru
Joined
Apr 4, 2014
Messages
1,125
@fluentd, please, please, do not power off your system, but try to limit your use of the system to accessing necessities, until you have replaced the failed disk.

On a side note: What are some good Sata cards to purchase?
Try searching your favourite shopping place for 9211-8i. Search the forum for 9211-8i too.
 

BigDave

FreeNAS Enthusiast
Joined
Oct 6, 2013
Messages
2,479
According to the screeshot you posted, all but two of your drives have shown you their serial numbers.
This is good, but it would be better to know them all.
One of the drives (missing ser#) is identified as da1, and the other is da6.
AFAIK the only way to get the ser# from da6 is to smartctl -a /dev/da6
(someone please correct me if there is another way)
YOU MUST WRITE DOWN this drive's serial number before doing
anything else. Can't stress this enough!!!

Do not power off OR reboot your server, because doing so may change
the current order of how the drives appear in the GUI and you'll have to start
all over. Your level of parity means if during any part of the recovery process,
you happen to loose a second drive (very common), your pool is toast!

Once you have all the serial numbers, you must gain access to the drives
and match each number to a physical disk. When you do that, one drive
will be left unmatched and this is your bad disk. After reading this, it
sounds kinda simple, BUT if done incorrectly could lead to the loss
of ALL your data, so please be careful.
 

fluentd

Dabbler
Joined
Aug 20, 2013
Messages
26
Since the two drives that are not giving me a serial number are both different sizes, ill track down all the drives serials and the two left should give me an indication that the 2tb with an unknown serial is my failed drive. I dont see any other way to do this than by shutting down the system and pulling each drive out to get all the serial numbers. Obviously if if come across a 2 TB with no matching serial then that is most likely my one.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
Can you post a debug file on your system please? System -> Advanced -> Save Debug.

Thanks.
 

solarisguy

Guru
Joined
Apr 4, 2014
Messages
1,125
Since the two drives that are not giving me a serial number are both different sizes, ill track down all the drives serials and the two left should give me an indication that the 2tb with an unknown serial is my failed drive. I dont see any other way to do this than by shutting down the system and pulling each drive out to get all the serial numbers. Obviously if if come across a 2 TB with no matching serial then that is most likely my one.
Please do not shutdown your system, yet!

Serial numbers should be visible (on a label) on the side of the drive opposite to the connectors. You may be able to read them without shutting down the system. A magnifying glass and some light source could help...
 

fluentd

Dabbler
Joined
Aug 20, 2013
Messages
26
ok I wont shut it down.. drive ones be here for another day or so anyways. What do I do after I click Save Debug?
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
Ok, so here's some bullets in no particular order:

1. You are using AMD hardware. We *know* AMD stuff just doesn't always work right. So I can't say I'm too surprised. In fact, when I asked for the debug I was specifically curious to know if it was an AMD-based system. I wasn't going to come out and ask because (1) the debug tells me a bunch about your system so I can rule out some problems and (2) I'm tired of being accused of being an Intel shill every time I ask if a system is AMD based.
2. mpt stuff is old. *really* old. It doesn't even support >2TB drives.
3. You have a massive amount of storage relative to the amount of RAM you have. You have the minimum, yet have something like 20TB+ of disks. So again, I can't say I'm too surprised that you're having problems. Things don't work properly when you overextend your hardware. We know that random problems begin to develop when you don't have adequate resources for your zpools.
4. One of your zpools is 92% full. Again, not something you should be doing. Aside from extremely poor performance due to unrecoverable fragmentation in the future, this isn't likely to be a problem like you are experiencing- aside from the possibility that the fragmentation will create massively higher I/O to the disks than normal and could be taxing your LSI controller.
5. Disk da1 and your controller are not happy with each other. Not sure specifically what is wrong, but those two seem to not be a happy "couple" at all. The fact that you can't do a smartctl earlier really causes me concern and think that the disk is probably failing/failed.

Despite all that I just mentioned above, the fact you've deviated so far from many of our recommendations for hardware, zpool layout, and probably a 1/2 dozen or so other recommendations I'm not inclined to really say much except "this is what you get when you ignore our recommendations". We tell people what not to do to avoid problems. People often do them anyway, then are flabbergasted at the result. Sometimes the problems come up immediately, other times they creep up later on after its worked fine for a while.

Overall though, you're disk da1 and your controller aren't happy. There's little troubleshooting that you can really do at this point except to start replacing stuff. Unfortunately, because your design is so overworked, underpowered, and using hardware that isn't even expected to be compatible to start with, even if you replace your mpt controller with something new, if you continue to get the error (whether its the same error or a different error) is not really going to be helpful because there's no way to prove that your issues aren't a combination of problems and you have to fix them all to have a properly working system.

The zpool scrub was pretty worthless. Not sure what the intent was for doing that, but I wouldn't have done the scrub because your zpool is already degraded.

I don't think there's much additional risk from powering off/on your system. But, if you don't have backups I'd definitely make it the highest priority to backup your data right now before you do anything else on your server. There's a very good chance this won't end well because you are running RAIDZ1 and have a FAULTed disk in your primary zpool.

With that, I can only wish you luck. :/
 

fluentd

Dabbler
Joined
Aug 20, 2013
Messages
26
Hey cyberjock,

Thanks for the "kind" words. The fact is that this system is very old and when I was specing out my first Nas system there really was no guidelines available. This was 6 years ago if you want a time frame. I even had asked the community for help by offering what I was looking at buying and nobody once told me that I need this, this or this. I also had no idea that I would need more then 8 gigs of ram, at the time 4 gigs was ok to use from what I remember. I actually have a new system sitting right here that I could use to upgrade this machine once I get the zpool back up and running without a faulty drive.

I will go ahead and read your guide in your signature since the condescending reply that you gave me shows I sure do need it :).

I also am not 100% sure on if the SAS controller that I purchased way back then was even that good but I remember it was recommended by someone in these forums. I will take the suggestion from BigDave and look into the one he suggested but as you can tell I don't live NAS system building and do it more for a hobby.

Ok, so here's some bullets in no particular order:

1. You are using AMD hardware. We *know* AMD stuff just doesn't always work right. So I can't say I'm too surprised. In fact, when I asked for the debug I was specifically curious to know if it was an AMD-based system. I wasn't going to come out and ask because (1) the debug tells me a bunch about your system so I can rule out some problems and (2) I'm tired of being accused of being an Intel shill every time I ask if a system is AMD based.
2. mpt stuff is old. *really* old. It doesn't even support >2TB drives.
3. You have a massive amount of storage relative to the amount of RAM you have. You have the minimum, yet have something like 20TB+ of disks. So again, I can't say I'm too surprised that you're having problems. Things don't work properly when you overextend your hardware. We know that random problems begin to develop when you don't have adequate resources for your zpools.
4. One of your zpools is 92% full. Again, not something you should be doing. Aside from extremely poor performance due to unrecoverable fragmentation in the future, this isn't likely to be a problem like you are experiencing- aside from the possibility that the fragmentation will create massively higher I/O to the disks than normal and could be taxing your LSI controller.
5. Disk da1 and your controller are not happy with each other. Not sure specifically what is wrong, but those two seem to not be a happy "couple" at all. The fact that you can't do a smartctl earlier really causes me concern and think that the disk is probably failing/failed.

Despite all that I just mentioned above, the fact you've deviated so far from many of our recommendations for hardware, zpool layout, and probably a 1/2 dozen or so other recommendations I'm not inclined to really say much except "this is what you get when you ignore our recommendations". We tell people what not to do to avoid problems. People often do them anyway, then are flabbergasted at the result. Sometimes the problems come up immediately, other times they creep up later on after its worked fine for a while.

Overall though, you're disk da1 and your controller aren't happy. There's little troubleshooting that you can really do at this point except to start replacing stuff. Unfortunately, because your design is so overworked, underpowered, and using hardware that isn't even expected to be compatible to start with, even if you replace your mpt controller with something new, if you continue to get the error (whether its the same error or a different error) is not really going to be helpful because there's no way to prove that your issues aren't a combination of problems and you have to fix them all to have a properly working system.

The zpool scrub was pretty worthless. Not sure what the intent was for doing that, but I wouldn't have done the scrub because your zpool is already degraded.

I don't think there's much additional risk from powering off/on your system. But, if you don't have backups I'd definitely make it the highest priority to backup your data right now before you do anything else on your server. There's a very good chance this won't end well because you are running RAIDZ1 and have a FAULTed disk in your primary zpool.

With that, I can only wish you luck. :/
 

BigDave

FreeNAS Enthusiast
Joined
Oct 6, 2013
Messages
2,479
Since the two drives that are not giving me a serial number are both different sizes, ill track down all the drives serials and the two left should give me an indication that the 2tb with an unknown serial is my failed drive. I dont see any other way to do this than by shutting down the system and pulling each drive out to get all the serial numbers. Obviously if if come across a 2 TB with no matching serial then that is most likely my one.
You would be correct about this ^^^^^^^^^

As for me, being a kindly and gentle soul, I'm gonna say a prayer that when you replace this failing drive,
that another drive doen't bite the dust during the re-silver process and send your data to digital heaven :(
Good luck to you :)
 

BigDave

FreeNAS Enthusiast
Joined
Oct 6, 2013
Messages
2,479

fluentd

Dabbler
Joined
Aug 20, 2013
Messages
26
Status update.. drive is resilvering.. funny thing... out of all my drives I randomly pulled one out and it was the faulted drive.. first try lol.
 

fluentd

Dabbler
Joined
Aug 20, 2013
Messages
26
Looks like resilvering completed successfully! Now getting back my shared folders do I just need to redo them?
 
Status
Not open for further replies.
Top