Hello everyone, long time lurker, first-time poster here. This is going to be a long post, sorry.
I have had a FreeNAS server running Plex and CrashPlan for the last 2 years, and have been very happy with it...until this past week. Let me preface the rest of this by saying that after a week's worth of reading and googling, I KNOW I made some mistakes with my server build. With that said, I purchased a shiny new server based on cyberjock's recommendations, and parts are arriving today. So I am hoping to dispense with the excessive trout to the face here, as I am trying to reverse what I know now to be poor decisions made 2 years ago.
Anyway...a few days ago, I out of the blue received an email stating "The volume red5 (ZFS) state is ONLINE: One or more devices has experienced an error resulting in data corruption. Applications may be affected." As this is the first error I have seen with this server, I quickly looked into things, and found not one thing wrong with the machine (well, after it finished a 9 hour scrub, I was unable to access the server until that was finished). I ran the various SMART tests, an overnight MEMTEST, a separate scrub, no errors reported, no corruption found. Admittedly, I am a ZFS newb, so the lack of actual problems was a bit of a head scratcher. I replaced the crappy SATA controller and the cables just to be safe, and moved on with my day. I was watching movies until late last night via Plex, no problems.
Cut to this morning, and now I have 2 more errors:
The volume red5 (ZFS) state is ONLINE: One or more devices has experienced an error resulting in data corruption. Applications may be affected.
The volume red5 (ZFS) state is ONLINE: One or more devices are faulted in response to IO failures.
The server is going through a scrub right now so I cannot do much, but based on zpool status I was able to run, it is again stating that there are no problems or errors. Which makes absolutely no sense to me. And that has led me to post here.
I am building a new server and intend to migrate my zpool to it, but before I even dig into that I have some questions, mostly to satisfy my curiosity. If the simple answer is "trash zpool and rebuild" then fine. But I will obsess over this so I am going to ask anyway:
1. Knowing that there are likely a variety of reasons why a zpool would "glitch" with no discernible actionable errors, is there something(s) specific I could run to figure out if this is a drive error, or just a crappy hardware error? I am running a prosumer motherboard using 8GB of non-ECC memory (yes, I know), using the built-in SATA controller and a cheap PCI SATA controller for the extra ports. I suspect (and hope) the problem lies solely with the PCI controller.
2. Is there anything I can do to make zfs less "glitchy"? I suspect that having new hardware within the recommended specs is going to mitigate this one, but I have also read plenty of posts that have led me to feel like zfs is one touchy beast.
3. Anyone have any suggestions for a place/service that I can place a backup of a 24TB dataset? I was thinking of using CrashPlan Cloud, but the amount of time the initial seed will take makes me shudder. I know there are storage experts in here, so I could use the advice. I cannot afford a replication server or more hard drives, unfortunately. And anything has to be better than the towering stack of used small drives I am currently using.
Thanks for the read...
Cain
I have had a FreeNAS server running Plex and CrashPlan for the last 2 years, and have been very happy with it...until this past week. Let me preface the rest of this by saying that after a week's worth of reading and googling, I KNOW I made some mistakes with my server build. With that said, I purchased a shiny new server based on cyberjock's recommendations, and parts are arriving today. So I am hoping to dispense with the excessive trout to the face here, as I am trying to reverse what I know now to be poor decisions made 2 years ago.
Anyway...a few days ago, I out of the blue received an email stating "The volume red5 (ZFS) state is ONLINE: One or more devices has experienced an error resulting in data corruption. Applications may be affected." As this is the first error I have seen with this server, I quickly looked into things, and found not one thing wrong with the machine (well, after it finished a 9 hour scrub, I was unable to access the server until that was finished). I ran the various SMART tests, an overnight MEMTEST, a separate scrub, no errors reported, no corruption found. Admittedly, I am a ZFS newb, so the lack of actual problems was a bit of a head scratcher. I replaced the crappy SATA controller and the cables just to be safe, and moved on with my day. I was watching movies until late last night via Plex, no problems.
Cut to this morning, and now I have 2 more errors:
The volume red5 (ZFS) state is ONLINE: One or more devices has experienced an error resulting in data corruption. Applications may be affected.
The volume red5 (ZFS) state is ONLINE: One or more devices are faulted in response to IO failures.
The server is going through a scrub right now so I cannot do much, but based on zpool status I was able to run, it is again stating that there are no problems or errors. Which makes absolutely no sense to me. And that has led me to post here.
I am building a new server and intend to migrate my zpool to it, but before I even dig into that I have some questions, mostly to satisfy my curiosity. If the simple answer is "trash zpool and rebuild" then fine. But I will obsess over this so I am going to ask anyway:
1. Knowing that there are likely a variety of reasons why a zpool would "glitch" with no discernible actionable errors, is there something(s) specific I could run to figure out if this is a drive error, or just a crappy hardware error? I am running a prosumer motherboard using 8GB of non-ECC memory (yes, I know), using the built-in SATA controller and a cheap PCI SATA controller for the extra ports. I suspect (and hope) the problem lies solely with the PCI controller.
2. Is there anything I can do to make zfs less "glitchy"? I suspect that having new hardware within the recommended specs is going to mitigate this one, but I have also read plenty of posts that have led me to feel like zfs is one touchy beast.
3. Anyone have any suggestions for a place/service that I can place a backup of a 24TB dataset? I was thinking of using CrashPlan Cloud, but the amount of time the initial seed will take makes me shudder. I know there are storage experts in here, so I could use the advice. I cannot afford a replication server or more hard drives, unfortunately. And anything has to be better than the towering stack of used small drives I am currently using.
Thanks for the read...
Cain