Errors during file transfer

Status
Not open for further replies.

Mirfster

Doesn't know what he's talking about
Joined
Oct 2, 2015
Messages
3,215
I tried doing some of the Smart Tests, the conveyance test only works on the Seagate drive. on the HGST drives I get
Not a big deal. None of my HGST drives (2TB or 3TB) support the conveyance test. Not all vendors do, so this can be ignored.
 

Mirfster

Doesn't know what he's talking about
Joined
Oct 2, 2015
Messages
3,215
I have an ESXi server running that I was thinking of putting a DNS server on. Any suggestions of what to use for this purpose?
Your router should be able to handle Name Resolution, maybe check there first.

Is there some way to clear existing SMART errors after I change cables?
I may be wrong, but I think those error will not appear after a successful SMART Test.
 

anodos

Sambassador
iXsystems
Joined
Mar 6, 2014
Messages
9,554
That was on my list of things to figure out some day... I have an ESXi server running that I was thinking of putting a DNS server on. Any suggestions of what to use for this purpose?
In a previous thread you mentioned that you were considering replacing your residential gateway device with a PFSense appliance. PF-Sense can be configured as a DNS Resolver for your network. You can either create host overrides for your network hosts or you can check the box to register DHCP leases (and static IP address mappings) in the DNS resolver. Once you do this you will need to
1) change your DHCP settings so that your DHCP clients will point to the PFSense appliance as their primary DNS.
2) Enable DNS query forwarding. This may require some additional config under "System: General Setup".

Is there some way to clear existing SMART errors after I change cables?
If you're referring to the values indicating UDMA-CRC errors - no. Just see if the error counts continue to increase.
 

Zaaphod

Contributor
Joined
Dec 15, 2015
Messages
109
I did get a PFSense VM installed, but I had problems getting my cable modem to send it an IP address, but when I tried to put it back on my router, it had the same problem, but I finally got it working on the router so I just left it alone. I'll get back to that and post on the other thread.

If you're referring to the values indicating UDMA-CRC errors - no. Just see if the error counts continue to increase.

Is there a smartctl command to just return a quick error count summary?
 

anodos

Sambassador
iXsystems
Joined
Mar 6, 2014
Messages
9,554
I did get a PFSense VM installed, but I had problems getting my cable modem to send it an IP address, but when I tried to put it back on my router, it had the same problem, but I finally got it working on the router so I just left it alone. I'll get back to that and post on the other thread.



Is there a smartctl command to just return a quick error count summary?
"smartctl -A" outputs the attributes for the drive.

Regarding PF-Sense, it sounds like you were trying to configure double-nat. Ideally, you would put your modem in bridging mode (or purchase a plain-jane modem), then either configure your pfsense appliance with the static address provided by your ISP or use DHCP on the WAN interface (with the firewall appliance pulling its network information from your ISP's DHCP server). You configure wireless access through wireless APs that are connected to the switch(es) that are behind the firewall (like those sold by ruckus or ubiquiti).
 

Zaaphod

Contributor
Joined
Dec 15, 2015
Messages
109
"smartctl -A" outputs the attributes for the drive.

Great! just what I wanted. So far so good... been transferring files for 3 hours with no errrors with the new cables installed.. I'm curious if there is a way to do these things on a windows pc without removing the drives? I've had some odd events like internal drives un-mounting themselves and not being able to re-mount unless I changed ports. perhaps more bad cables and moving them around to change ports is what fixed it? anyway, just wondering if they recorded any errors

Regarding PF-Sense, it sounds like you were trying to configure double-nat. Ideally, you would put your modem in bridging mode (or purchase a plain-jane modem), then either configure your pfsense appliance with the static address provided by your ISP or use DHCP on the WAN interface (with the firewall appliance pulling its network information from your ISP's DHCP server). You configure wireless access through wireless APs that are connected to the switch(es) that are behind the firewall (like those sold by ruckus or ubiquiti).

What I did was configure one lan port to be a WAN and all the rest of them including the VM network to all be on a single LAN interface. I unplugged my cable modem from my router, unplugged the router from the network, plugged the cable modem into the WAN designated port and booted up pfSense. My original router was set to obtain it's WAN address from the cable modem, so I set pfSense to do the same thing. but pfSense only had an address for the LAN side, and it was as expected, something like 192.168.1.1 , the WAN was just blank. but, as I mentioned, the same thing happened to my router when I tried to hook it back up as well.. but after a while it did obtain it's address and started working again. So it seems like maybe my cable modem is either really slow at assigning ip addresses or maybe I'm powering things up in the wrong order.. I'm never really sure if I should have the cable modem on first or the router on first.. so I try it both ways.. I'm probably just not waiting long enough.

I thought as a test I could just disconnect the ESXi server from the network altogether and then just plug the WAN port into my existing network and see if my existing router assigns it an address.. then I could pretend that was from the cable modem and the VMs would all have to go through pfSense to access anything off the ESXi Server
 

Zaaphod

Contributor
Joined
Dec 15, 2015
Messages
109
Don't rule out a possible power problem in the context of this error message.
I sure hope I didn't have a power problem. I have this server plugged into a double conversion true sine wave output power conditioner with a ridiculous bank of batteries which can keep everything going at full load for a power failure of over 20 hours long.
however my brand new PSU could have a defective capacitor or something in it.
then again.. it seems like a bad connection on either a power connector or the SATA connector could also cause that error... because I suppose the error is based on power on, reset, or the bus device from the hard drive's point of view.
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
I suppose the error is based on power on, reset, or the bus device from the hard drive's point of view
I read it as the system reporting that it was trying to talk to the device, but the device went away unexpectedly. Another member had a recurring problem like this, despite having dual redundant PSUs. If I remember correctly, there was some kind of failover edge case occurring during regular automated tests, where one of the PSUs was faulty but undiagnosed, and everything looked normal to the operator because it would switch right back to the good one ... or something.

And I didn't see anything out of place in your drives' SMART attributes, just the obvious logged errors.
 

Zaaphod

Contributor
Joined
Dec 15, 2015
Messages
109
I have my current 3 mirror set of drives still copying test data, and a 4th blank brand new Seagate drive that I'm just now starting a Smart Long test on a different machine. When the test is done, if successful, I want to add it to the current Raid and mirror everything over to it, then when that's complete, pull one of the other drives for the long test.

This is a new procedure for me, so advice is appreciated. I would like to learn how to do this while the server is running I have a hot swap rack hooked up.
My questions are:
What is the procedure to hot add the new drive
How do I add it to the existing pool
How do I sync my data to it
How do I know when it's done syncing
How do I remove a drive from the pool
Is there something like a 'drive only shutdown' procedure I should perform before removing the drive to make sure there aren't any open files or transfers in progress before removing it?

Thanks for all the great help and advice on here!
 

Mirfster

Doesn't know what he's talking about
Joined
Oct 2, 2015
Messages
3,215
When the test is done, if successful, I want to add it to the current Raid and mirror everything over to it, then when that's complete, pull one of the other drives for the long test
I don't think that is possible, while I can see adding a drive to a mirror set; I am not sure if you can easily remove a drive from a mirror (unless replacing it). As for running the long smart test, you should not have to remove the drive to perform that either. I understand that currently, you are using a Ubuntu Live CD to do the smart tests, but this should all be able to be done within FreeNas. Actually, it is recommended along with periodically doing short tests, scrubs and config backups.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194

Zaaphod

Contributor
Joined
Dec 15, 2015
Messages
109
My idea here with rotating drives into and out of the mirror is to end up with a few drives that contain all the data up to the point they were pulled so that if there is some kind of massive event like a direct lightning strike that takes out all the computers and everything that's connected to the network, I still have data saved on the pulled drive. With no wires connected to the pulled drives, they would be completely isolated from the event.

It's good to know I can just run the long test while the drive is hooked up to FreeNAS. How does that work if data writes are happening during the test? does it just not write to that drive during the test then catch it up later, or is the drive usable during the test?
 

anodos

Sambassador
iXsystems
Joined
Mar 6, 2014
Messages
9,554
My idea here with rotating drives into and out of the mirror is to end up with a few drives that contain all the data up to the point they were pulled so that if there is some kind of massive event like a direct lightning strike that takes out all the computers and everything that's connected to the network, I still have data saved on the pulled drive. With no wires connected to the pulled drives, they would be completely isolated from the event.

It's good to know I can just run the long test while the drive is hooked up to FreeNAS. How does that work if data writes are happening during the test? does it just not write to that drive during the test then catch it up later, or is the drive usable during the test?
I think it's a better idea to build a second FreeNAS system and replicate to its pool rather than pulling / swapping out drives. If your internet connection is fast enough you can keep the second system permanently off-site. Otherwise, one day you'll screw up pull the wrong drive or break something (yes, I'm pessimistic).
 

Zaaphod

Contributor
Joined
Dec 15, 2015
Messages
109
I think it would take a long time to sync terrabytes over the internet. I suppose I could put the second NAS on the LAN, sync it, then move it off-site and maintaining the changes to it over the internet wouldn't be so bad. I'm not really sure how to make two FreeNAS systems duplicate in this manner, on a LAN or over the internet, but it would be fun to learn :)
 

anodos

Sambassador
iXsystems
Joined
Mar 6, 2014
Messages
9,554
I think it would take a long time to sync terrabytes over the internet. I suppose I could put the second NAS on the LAN, sync it, then move it off-site and maintaining the changes to it over the internet wouldn't be so bad. I'm not really sure how to make two FreeNAS systems duplicate in this manner, on a LAN or over the internet, but it would be fun to learn :)
Seed your backup over the LAN, then move second server offsite. See "replication tasks" http://doc.freenas.org/9.3/freenas_storage.html#replication-tasks
 

Zaaphod

Contributor
Joined
Dec 15, 2015
Messages
109
Well, I'm still having issues with CRC errors.. I'm not sure how to isolate them... but I do notice a pattern emerging. It seems I can write all I want with no errors, I did massive robocopy commands that took 6 hours of constant writing with no issues... but if I Read one tiny file off FreeNas, POOF instant errors. I was doing reads while I was writing before, thinking that was causing a problem, even though it shouldn't.. it would be an impossible task to never read while writes were happening... So I tried a little test.. The last bulk write I did was last night, it ended at 9:30pm, I didn't have any CRC errors showing, just the network name resolution stuff. So I hit enter on the terminal screen to get to the menu, let it sit there a long time.. no errors.. I opened up a file, and just as soon as I clicked the file to open, BAM a whole screen like this:

read error a1.PNG


Any ideas what would cause errors when I read a file but not when I write to files?
 

anodos

Sambassador
iXsystems
Joined
Mar 6, 2014
Messages
9,554
Well, I'm still having issues with CRC errors.. I'm not sure how to isolate them... but I do notice a pattern emerging. It seems I can write all I want with no errors, I did massive robocopy commands that took 6 hours of constant writing with no issues... but if I Read one tiny file off FreeNas, POOF instant errors. I was doing reads while I was writing before, thinking that was causing a problem, even though it shouldn't.. it would be an impossible task to never read while writes were happening... So I tried a little test.. The last bulk write I did was last night, it ended at 9:30pm, I didn't have any CRC errors showing, just the network name resolution stuff. So I hit enter on the terminal screen to get to the menu, let it sit there a long time.. no errors.. I opened up a file, and just as soon as I clicked the file to open, BAM a whole screen like this:

View attachment 10550

Any ideas what would cause errors when I read a file but not when I write to files?
It might be related to the firmware for your HBA. It might be a problem with your power supply. It might be a hard drive problem. Try to test and isolate the problem.
 
Status
Not open for further replies.
Top