Large transfers over CIF always fail.

Status
Not open for further replies.

Kayman

Dabbler
Joined
Aug 2, 2014
Messages
23
First, I was telling you about that menu. I wasn't telling you what to set or anything else. Just leaving the door open. If you want to spend a few months learning how all this stuff works (or not) that's totally your business. I didn't change my settings, and even if I did I wouldn't share them because they almost certainly wouldn't apply unless you chose to match my harder perfectly, down to the hard drive firmware version. Some of us will gladly spend months doing research as we find file servers to be a very fulfilling hobby. Others don't care. For those that don't care they either accept the limitations of whatever they have or they go with something like a Synology which has even more limitations. Pick one. ;)

Ok looks like I misinterpreted you post. Sorry

Second, I already told you how I deal with it. Find the failing disk and simply replace it. That's how you deal with it. Now your disk is misbehaving for reasons that don't appear on SMART. So either something is wrong with the disk itself, or something else is indirectly affecting how the disk operates. I can't tell you which because I don't know which. I don't have your hardware so I can't troubleshoot the problem to that level.
Alright last question and please excuse my total ignorance here. How exactly would I go about isolating and finding a single bad disk in a big pool WITHOUT destroying the pool. Keeping in mind that there's no indication from smart that there is anything wrong with any disk. And scrubs didn't show any errors either. Do I have to pick a disk at random offline it and hope the pool works. What exactly do I have to do to find the bad disk.
 

Robert Smith

Patron
Joined
May 4, 2014
Messages
270
Congratulations, you discovered a FreeNAS deficiency—it puts too much trust into SMART data.

Please treat this post as a more general discussion on the state on NAS, rather than concrete recommendations what one should do.

Well, there is no single the best, the one solution for network attached storage. There is no the recipe. You will be surprised how many players there are that offer storage solutions. Once the one and only solution gets invented there will be a lot of companies going out of business; but not yet, not yet.

So, what one should do, then?

First, you can try to find more honest hard drives. Unfortunately, it has been reported that, to cut down on customer returns, hard drive manufacturers have been fudging SMART data. Case in point, your hard drive reports that it is in perfect condition, while obviously it is not. You can put that model number on your personal black list; scout other people’s black lists, and try to pick ‘known good for NAS’ drives, whatever that is.

Such hard drive statistics is spotty, and with new models and revisions coming out all the time it may be impossible to always pick known good hard drives. Said that, enterprise hard drives have been historically more reliable overall, so stepping up to enterprise grade could be the answer.

What else can you do?

If you budget allows it, you can go with a paid enterprise-level NAS solution. That way, when something goes wrong, you just pick up the phone and yell, “your s#it is broken again, come and fix it.”

If you want something for “free”, you can try some other NAS-de-jour, and see if that one works better for you. OMV, for example, is popular these days.

If no distribution out there fits you, roll your own. Find all the right utilities, and script them together exactly the way you want; or even write your own software.

If you are in corporate environment, nobody was ever fired for choosing Microsoft. Load up the latest Windows Server and ReFS your data disks.

Good luck!
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I wouldn't say it's a FreeNAS deficiency. Aside from the disk disconnecting there appears to be no evidence of anything being wrong. It could be a flaky PSU, flaky SATA cable, even the SATA/SAS controller may just have an intermittently bad port. The one thing that is for sure is that it's not likely to be an OS problem since your hardware matches *tons* of other users and they aren't having this issue. It could be that WD Green has bad firmware chip causing its problems. Until you can point a finger it's just guesswork. But, trying another OS isn't likely to buy you much as it's not likely to be something like a driver problem with his hardware. I'm using WD Greens (24 of them) and I've had a great experience with them. I'd have bought Greens as replacements except the Greens aren't available in 6TB sizes.

Literally you've pretty much done all the easy stuff... SMART monitoring and testing and ZFS scrubs are about the extent for "easy fruit to grab". Now it gets down to more serious stuff like trying a different SATA cable, try a different PSU, things like that. You need to start ruling out hardware as a possible cause for your angst.
 

mjws00

Guru
Joined
Jul 25, 2014
Messages
798
Kayman,

Don't sweat it. You just hit a weird hardware glitch. Normally when drives fail, they are going to let you know via SMART, temp, noise, speed etc. You likely won't see this again in your lifetime. SMART is cool, but it can't see everything. Chuck the POS drive and don't worry about it. Your hardware strategy is ok, you have a solid foundation. The cheap used disks are gonna let you down occasionally... and that is in your plan. The expensive, large, new ones fail too. That is why we do this.

One possible thing to do with mass used hardware is to burn the disks in HARD in a separate pool (or your old machine?) until you trust them. Then you aren't putting new, unproven, potentially faulty gear into a stable system.

You can see the IO for each disk in your pool separately with 'zpool iostat -v'. It likely would have shown you the anomaly. You also had different output on the graphs showing up in your reporting tab for each disk. Those are the hints you got for this one. No need to destroy the pool in your hunt. Odd hardware errors test us all. At some point you have no choice but to start getting ugly and swapping out and testing individual components.

Good luck, hope you love your new server.
 

Kayman

Dabbler
Joined
Aug 2, 2014
Messages
23
Thanks for the reply's and the reassurance, as you can understand this has been very frustrating for me and I've spent a lot of time stressing over this.

Alright at this point it's 100% certain it's the bad drives fault. I just put it back in the case in it's original location so everything was exactly how it was. Swapped the 2 break out cables at the card end. Got a spare ssd and installed OpenIndiana Napp-it. It's based on Solaris which as far as I'm aware is a completely different OS to Freebsd. I remade the same pool and the system had exactly the same symptoms. (Again with no indication of anything being wrong) I removed the bad drive (now on a different port on the card) replaced it remade the pool and everything started working.

So that to me clears freenas of any blame, it is a hardware problem and it is the drive. I'm confident everything else is good I've done enough testing to make sure of it.
CPU 6 hours Prime95 no errors. Temps pretty reasonable.
Memory 3 passes of memtest86 no errors.
Motherboard I'm going to assume is all good. It was old stock and discounted because all the new x10 series supermicro boards had already come out but it was new and unopened when i got it.
Lsi 9211-8i. When I had windows installed I ran crystaldiskmark on a SSD. All results were consistent across all 8 breakout cables.
PSU bought brand new only a month ago. I did consider it as the cause the old psu was very old so I replaced it anyway.
The case is also very good with good airflow and all the drives stay below 35C.

So that's it I'm just not going to use the bad drive and make my pool and be done with it. 12 drives dual vdev z2. I've got 8 spare 1.5TB greens to fall back on in case I hit problems. When I run out of space I'll add another 6 on but for now I"ll just leave them out.
You can see the IO for each disk in your pool separately with 'zpool iostat -v'. It likely would have shown you the anomaly. You also had different output on the graphs showing up in your reporting tab for each disk. Those are the hints you got for this one. No need to destroy the pool in your hunt. Odd hardware errors test us all. At some point you have no choice but to start getting ugly and swapping out and testing individual components.

Good luck, hope you love your new server.
Thanks for this I'll definitely try that at some stage. Right now everything works and I'm done messing around with it for a while.
 
Status
Not open for further replies.
Top