Errors during file transfer

Status
Not open for further replies.

Zaaphod

Contributor
Joined
Dec 15, 2015
Messages
109
I received an Error 0x80070079: The Semaphore timeout period has expired while copying some files to a new FreeNAS installation.

I went to FreeNAS and it was showing alarms:

  • CRITICAL: Device: /dev/da1 [SAT], 48 Currently unreadable (pending) sectors
  • CRITICAL: Device: /dev/da1 [SAT], unable to open device
  • CRITICAL: Device: /dev/da0 [SAT], ATA error count increased from 0 to 1
  • CRITICAL: Device: /dev/da0 [SAT], Read SMART Self-Test Log Failed
  • CRITICAL: Device: /dev/da0 [SAT], Read SMART Error Log Failed
  • CRITICAL: Device: /dev/da0 [SAT], unable to open device
  • CRITICAL: The volume Helium_8TB (ZFS) state is UNAVAIL: One or more devices are faulted in response to IO failures.
  • WARNING: New feature flags are available for volume Helium_8TB. Refer to the "Upgrading a ZFS Pool" section of the User Guide for instructions.
I don't really understand what any of this means other than it could not open my hard drives, but I can't figure out why.

This is a brand new installation, just brought on-line hours before this transfer. It had been successfully transferring files for about 2 hours before this happened. I was watching one video off the FreeNAS server while the transfer was going on. The file I was watching did have a long pause in it when these errors were happening, but then it did recover and finished playing. This all happened last night, I figured the pause was just a network slowdown due to all my file transfers. I discovered this morning that it was actually a failure.

My setup is as follows:
SuperMicro A1SA7-2750F Motherboard
32GB ECC RAM
Booting from a 32GB USB stick
Storage is two brand new HGST 8TB Helium hard drives configured as a mirrored set.
These drives have never been used before this installation

How do I go about troubleshooting this?? I find it unlikely that both brand new drives would fail in the same way at the same time. Is there a problem with my motherboard?

Any help on how to diagnose the problem would be greatly appreciated. The system is currently still running however I can not longer access the network shares on it. When I get the status of the volume, FreeNAS shows it faulted and removed.

My console display is currently scrolling though seemingly endless messages:

Terminated ioc 804b scsi 0 state c xfer 0
(probe0:mps0:0:1:0): INQUIRY. CDB: 12 00 00 00 24 00 length 36 SMID 855





Capture.PNG


Capture2.PNG


Capture 3.PNG
 
Last edited:

Zaaphod

Contributor
Joined
Dec 15, 2015
Messages
109
Yes, they appear to be. However I am using the cables that came with the super micro motherboard, and they do not have the metal clips on them. I guess it would be a good idea to use the cables with clips just to make sure they stay secure. I did notice these HGST drives seem to vibrate more than normal... they are both the same as each other, but still something I'm concerned about. I do have another pair of them that I have not powered up yet. I also have some Seagate 8TB drives, but they are not enterprise class drives. I was going to use those for off-line backups. I'm also looking into this vibration issue. I expected these drives to run smoother than a regular desktop drive but they both have heavy vibrations

If it is a cable getting loose due to the vibration, then I would think one would have went and then the second one at another time.. is there an error log somewhere that might show me the time and date of each failure.

Would it hurt to try to change one of the cables while it's still powered up?

I have not tried to reset this yet. I'm not really sure if I will be able to diagnose the problem if I re-boot it and the problem seems to go away. and I'm also not sure about how to shut it down safely while the console is spewing out all those messages
 

Mirfster

Doesn't know what he's talking about
Joined
Oct 2, 2015
Messages
3,215
Would it hurt to try to change one of the cables while it's still powered up?
Not recommended.

Couple other thoughts:
  1. Did you perform any Burn-In Testing? If not, I suggest you do before putting it into "Production"
    • Reference the two listings in my sig for more info
    • This will take a while (several days)
    • If you have room, you should stick in the Seagate drives as well and burn them in too (so you know they are good as well)
  2. You never mentioned the PSU you are using, is it properly rated?
  3. Are you using a splitter on the cables (either Data or Power)? If so, this may be the issue
  4. Make sure you have the latest BIOS/Firmware for your Motherboard, may help address any issues

I am assuming:
  1. That there is not any critical data on the system yet?
    • Thinking so, because I gather that you are really just in the process of copying your data there and do currently have a backup?
    • If this is not the case, then let us know...

I'm also not sure about how to shut it down safely while the console is spewing out all those messages
Have you tried to hit the "Shutdown" in the GUI?
 

Zaaphod

Contributor
Joined
Dec 15, 2015
Messages
109
  1. Did you perform any Burn-In Testing? If not, I suggest you do before putting it into "Production"
    • Reference the two listings in my sig for more info
    • This will take a while (several days)
    • If you have room, you should stick in the Seagate drives as well and burn them in too (so you know they are good as well)
  2. You never mentioned the PSU you are using, is it properly rated?
  3. Are you using a splitter on the cables (either Data or Power)? If so, this may be the issue
  4. Make sure you have the latest BIOS/Firmware for your Motherboard, may help address any issues

1. No, I have not done this. I will look into this. This is not in production yet, so will not be a problem.
2. Power supply is a Corsair HX850i 850watt. It should be massively overkill for this server. the fan on it never even comes on.
3. No splitters, just the factory PSU cables, and the SATA cables that came with the motherboard, no splitter, one port for each drive.
4. I did upgrade the bios and firmware. I was getting a warning about the SAS controller and I have updated it to the correct version


I am assuming:
  1. That there is not any critical data on the system yet?
    • Thinking so, because I gather that you are really just in the process of copying your data there and do currently have a backup?
    • If this is not the case, then let us know...
Yes that is correct.. no critical data yet. This was basically just some test data I was using to experiment with. I do have the original data and a backup of it, and it's all not critical anyway.

Have you tried to hit the "Shutdown" in the GUI?

Not yet. I didn't know if I should try that with all the errors going on or somehow address those first. I also did not know if there was something to be learned about what is going on that might be lost if I shutdown.. if it came up and started working normally then perhaps I wouldn't know what the issue was.

Thank you for the help
 

anodos

Sambassador
iXsystems
Joined
Mar 6, 2014
Messages
9,554
1. No, I have not done this. I will look into this. This is not in production yet, so will not be a problem.
2. Power supply is a Corsair HX850i 850watt. It should be massively overkill for this server. the fan on it never even comes on.
3. No splitters, just the factory PSU cables, and the SATA cables that came with the motherboard, no splitter, one port for each drive.
4. I did upgrade the bios and firmware. I was getting a warning about the SAS controller and I have updated it to the correct version



Yes that is correct.. no critical data yet. This was basically just some test data I was using to experiment with. I do have the original data and a backup of it, and it's all not critical anyway.



Not yet. I didn't know if I should try that with all the errors going on or somehow address those first. I also did not know if there was something to be learned about what is going on that might be lost if I shutdown.. if it came up and started working normally then perhaps I wouldn't know what the issue was.

Thank you for the help
Infant mortality is a thing. That's the reason why people do burn-in testing. If it fails, you RMA the drive. You can also pull the drive(s) from the server and run the manufacturer's tools for testing drives as another data point.

What version of LSI firmware is the onboard HBA flashed to?
 

Zaaphod

Contributor
Joined
Dec 15, 2015
Messages
109
I couldn't shut down with the GUI, so I went to the console and entered in 14. Shutdown.. it said it was shutting down, but it never did.. eventually it came back to the menu again. I tried it 3 times then I shut it down by holding in the power button. When it came back up I got the menu and no scrolling messages, but in the GUI I have the following alerts:

  • CRITICAL: Device: /dev/da0 [SAT], 8 Currently unreadable (pending) sectors
  • CRITICAL: Device: /dev/da1 [SAT], 48 Currently unreadable (pending) sectors
  • CRITICAL: The volume Helium_8TB (ZFS) state is ONLINE: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected.

I believe the LSI is flashed to V20.0

here's a screen shot of the config
lsi.PNG
 

Zaaphod

Contributor
Joined
Dec 15, 2015
Messages
109
Supermicro still only has V19 available on their website for this motherboard. Maybe I can get it off the avago site. This LSI2116 controller seems like it's always behind, it took me a week to get the V20 update.

Is there a way to check what version FreeNAS is expecting?
 

Zaaphod

Contributor
Joined
Dec 15, 2015
Messages
109
I can't find 20.00.04.00 for my LSI2116-IT anywhere. Is there a way I can download a previous version of FreeNAS that expects 20.00.00.00?
 

Mirfster

Doesn't know what he's talking about
Joined
Oct 2, 2015
Messages
3,215
It should provide an alert, however I am not 100% sure if it does for that minor revision...

I may be incorrect about your LSI2116 needing that version; I am using H200 cards crossflashed to SAS9211-8i. I can do some checking, perhaps in the meantime someone else who is more familiar with that card will chime in.
 

Zaaphod

Contributor
Joined
Dec 15, 2015
Messages
109
I was getting an alert when I had V19 flashed, but no alert this time.

I had it going with a 500GB drive just to play with and didn't have any issues, but I wanted to switch to the larger and mirrored drives. So I pulled the USB stick and the 500GB drive and kept those together, so I could tinker with that more if I wanted to, I got a new USB stick and downloaded the latest version of FreeNas and installed the new drives with that... so it's possible that I had a slightly older version of FreeNas working with my V20 firmware, and now the newer version is expecting something else. I've been poking around in FreeNas and I still can't figure out what version SAS controller it's expecting.
 

Mirfster

Doesn't know what he's talking about
Joined
Oct 2, 2015
Messages
3,215
I may be wrong, but am thinking that the LSI 2116 can use the SAS 9201-16i firmware. If so, then that version can be downloaded from here.

I am deducing this in regards to this thread in which the OP used the SAS 9201-16i firmware for his SuperMicro Motherboard. Looking at specs of both his and your motherboards shows they have the same controller.

Your MB Specs shows: "LSI® 2116 SW controller for 16x SATA3 (6Gbps) ports; SATA3 or SAS2 with IT mode"

His MB Specs shows: "LSI® 2116 SW controller for 16x SATA3 (6Gbps) ports; SATA3 or SAS2 with IT mode"

Edit: Added specs info
 

Mirfster

Doesn't know what he's talking about
Joined
Oct 2, 2015
Messages
3,215

Zaaphod

Contributor
Joined
Dec 15, 2015
Messages
109
Thank you for figuring that out! I went to your link and I'm a little confused which file to download...

I went under "Firmware" and downloaded Installer_P20_for_UEFI dated 09/18/2014 and it had V20.00.00.00
so I downloaded 9201_16i_Package_P20_IT_Firmware_BIOS_for_MSDOS_Windows dated 09/22/2014 and it does seem to have V20.00.04.00! so I guess that's what I want... but I would still like to figure out how to verify that 20.00.04.00 is what FreeNas is expecting. any ideas how to confirm this?
 

Zaaphod

Contributor
Joined
Dec 15, 2015
Messages
109
Thanks for the recommendation on the bios update. However that is the current version I have in my BIOS, and when I installed it, it did not include the SAS controller update from 19 to 20. Super Micro emailed me the SAS controller firmware for V20.00.00.00
 

anodos

Sambassador
iXsystems
Joined
Mar 6, 2014
Messages
9,554
Technically, the LSI firmware should work. You can probably use the firmware here since it and the integrated LSI card are both based on the SAS 2116 chipset.
http://www.avagotech.com/products/server-storage/host-bus-adapters/sas-9201-16i#downloads

There are some downsides to using the generic LSI firmware. You can contact supermicro and ask them to make the P.20.00.04 firmware available. See @cyberjock 's post #206 here: https://forums.freenas.org/index.ph...-driver-version-20-for-dev-mps0.36536/page-11

I'm not convinced your problems are related to your HBA firmware. Try moving the drives to another system and perform burn-in procedures. If the drives keep throwing errors, RMA them.
 

Zaaphod

Contributor
Joined
Dec 15, 2015
Messages
109
Technically, the LSI firmware should work. You can probably use the firmware here since it and the integrated LSI card are both based on the SAS 2116 chipset.
http://www.avagotech.com/products/server-storage/host-bus-adapters/sas-9201-16i#downloads

There are some downsides to using the generic LSI firmware. You can contact supermicro and ask them to make the P.20.00.04 firmware available. See @cyberjock 's post #206 here: https://forums.freenas.org/index.ph...-driver-version-20-for-dev-mps0.36536/page-11

I'm not convinced your problems are related to your HBA firmware. Try moving the drives to another system and perform burn-in procedures. If the drives keep throwing errors, RMA them.

I would feel better if SuperMicro supplied me with something specific for that motherboard.

I'm not so sure this is the problem either. I've managed to get it running again, and it shows the drives are coming up normal, but I can't read even a tiny file off the dataset without getting instant errors on the console, and then the whole thing stops responding. I'm going to attempt the controller update.. I can always go back to V20.00.00.00 supermicro supplied if I have a problem due to it being a generic drive.

What's the best way to burn in the drives on a windows 10 system? I've been trying to search for HGST test programs for the HE8 drives but haven't had any luck yet.
 

anodos

Sambassador
iXsystems
Joined
Mar 6, 2014
Messages
9,554
I would feel better if SuperMicro supplied me with something specific for that motherboard.

I'm not so sure this is the problem either. I've managed to get it running again, and it shows the drives are coming up normal, but I can't read even a tiny file off the dataset without getting instant errors on the console, and then the whole thing stops responding. I'm going to attempt the controller update.. I can always go back to V20.00.00.00 supermicro supplied if I have a problem due to it being a generic drive.

What's the best way to burn in the drives on a windows 10 system? I've been trying to search for HGST test programs for the HE8 drives but haven't had any luck yet.

You can boot from a Linux LiveCD on another system and perform a SMART conveyance test and a SMART long test for each drive. Once they are complete, post smartctl output for the drives.
This should also provide some indication as to whether the timeouts and other problems persist across systems. For more information about burn-in and testing see here: https://forums.freenas.org/index.php?threads/building-burn-in-and-testing-your-freenas-system.17750/
 

Zaaphod

Contributor
Joined
Dec 15, 2015
Messages
109
You can boot from a Linux LiveCD on another system and perform a SMART conveyance test and a SMART long test for each drive. Once they are complete, post smartctl output for the drives.
This should also provide some indication as to whether the timeouts and other problems persist across systems. For more information about burn-in and testing see here: https://forums.freenas.org/index.php?threads/building-burn-in-and-testing-your-freenas-system.17750/


I have an Ubuntu live CD.. but how do I actually do these tests? I did some searching and see references to them, but not how to actually perform them.
 
Status
Not open for further replies.
Top