Recent Spate of Updates Have Trashed a Formerly Perfect NAS

Status
Not open for further replies.

Charles Elliott

Dabbler
Joined
Oct 13, 2013
Messages
37
FreeNAS is being used by Windows 8.

The first update (9.2.0) turned the daily emailed log into a couple of lines of gibberish. The second made the log disappear; now no log is emailed. The third did not fix the problem. How do I get the log back, i.e., emailed to me with Ethernet statistics (e.g., packets and bytes transferred by interface)? Also, how can I get the security log back, i.e., emailed?

During the install, the second update declared that one of the hard disks had an error, and that the system was now running in degraded mode. Indeed, it did not lie, about the degraded performance. Formerly, I regularly saw 340 Mbs from the NAS; now 40 Mbs is a triumph. It used to take seconds to transfer a 2 GB file, now it literally takes minutes. None of the NAS error messages tell what disk is thought to be bad (3 x 1 TB rotating drives, 1 x 128 GB SSD, and 1 x 40 GB Patriot USB drive). How do I find out what disk is deemed bad and how do I fix it if I do find it?

The third install not only caught the alleged error (4 of the 5 hard drives are less than a few months old), but now flashes a yellow light in the Alerts icon. When the icon is clicked, the message is displayed:
"
WARNING:​
The volume Mark (ZFS) status is ONLINE: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected.Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'."​
Note that no device is named; the system knows what device is ailing and probably what it thinks is wrong with it. Why can't it enunciate that information?​
I fired up the shell and typed 'zpool clear', since I don't believe there truly is an error, but the shell says there is no command named zpool. Again, How do I find out what disk is deemed bad and how do I fix it if I do find it?

I could probably find the bad disk, if any, if I could just find reports of the S.M.A.R.T. tests. I set up the NAS to run one short test on each of the three 1 TB data storage disks each night, and one long test each month during the night. I can't find any of the test results or any way to find the test results. How do I find out if the S.M.A.R.T. tests are running, and where are the results if they are running?​

I have a number of downloaded utility programs on the NAS to which I have installed shortcuts on the Office toolbar. After the first update, none of those programs would execute, and I had to transfer them to the C: drive. However, I just tried running several programs from the NAS, and that appears to be fixed.​

The first update made it so I could not log in to the NAS. And of course I could not use any of the drive mappings in Windows Explorer to various parts of the NAS. I was finally able to log in to the NAS using the root account, and I was able to reestablish the drive mappings in Windows Explorer using root. But nothing I have tried (many, many times) will allow me to log in with my user name and password. How do I reestablish my user account on the NAS?​

After the second of the three updates, every time I start the NAS, the following error messages appear at the end of the log:​
"Mar 19 22:27:40 freenas mDNSResponder: mDNSResponder (Engineering Build) (Mar 1 2014 18:12:24) starting
Mar 19 22:27:40 freenas mDNSResponder: 11: Listening for incoming Unix Domain Socket client requests
Mar 19 22:27:40 freenas mDNSResponder: mDNS_AddDNSServer: Lock not held! mDNS_busy (0) mDNS_reentrancy (0)
Mar 19 22:27:40 freenas last message repeated 2 times
Mar 19 22:27:41 freenas kernel: done.
Mar 19 22:27:41 freenas mDNSResponder: mDNS_Register_internal: ERROR!! Tried to register AuthRecord 0000000800C2FD60 freenas.local. (Addr) that's already in the list
Mar 19 22:27:41 freenas mDNSResponder: mDNS_Register_internal: ERROR!! Tried to register AuthRecord 0000000800C30180 1.1.168.192.in-addr.arpa. (PTR) that's already in the list
Mar 19 22:27:41 freenas mDNSResponder: mDNS_Register_internal: ERROR!! Tried to register AuthRecord 0000000800C31D60 freenas.local. (Addr) that's already in the list
Mar 19 22:27:41 freenas mDNSResponder: mDNS_Register_internal: ERROR!! Tried to register AuthRecord 0000000800C32180 1.0.168.192.in-addr.arpa. (PTR) that's already in the list
Mar 19 22:27:41 freenas mDNSResponder: mDNS_Register_internal: ERROR!! Tried to register AuthRecord 0000000800C33D60 freenas.local. (AAAA) that's already in the list
Mar 19 22:27:41 freenas mDNSResponder: mDNS_Register_internal: ERROR!! Tried to register AuthRecord 0000000800C34180 A.A.C.F.4.A.E.F.F.F.0.9.5.2.2.0.0.0.0.0.0.0.0.0.0.0.0.0.0.8.E.F.ip6.arpa. (PTR) that's already in the list
Mar 19 22:27:41 freenas mDNSResponder: mDNS_Register_internal: ERROR!! Tried to register AuthRecord 0000000800C35D60 freenas.local. (AAAA) that's already in the list
Mar 19 22:27:41 freenas mDNSResponder: mDNS_Register_internal: ERROR!! Tried to register AuthRecord 0000000800C36180 B.A.C.F.4.A.E.F.F.F.0.9.5.2.2.0.0.0.0.0.0.0.0.0.0.0.0.0.0.8.E.F.ip6.arpa. (PTR) that's already in the list"​

Do these messages really matter? How can I prevent them from being output?​

On the main page of the FreeNAS browser window, in the left-hand pane, in the Volumes/View Volumes leaf, is the following information:​
Name Used Available Size Compression Compression Ratio Status​
Mark 885.0 GiB (48%) 940.2 GiB 1.8 TiB lz4 1.04x HEALTHY​

Since the NAS data volume is deemed healthy, and the O/S and NAS is running on the USB drive, does that imply that the 128 GB SSD is probably the drive that is causing the system to be degraded?​

Am I really getting 1.04x compression, that is, the data on the volume is larger than the sum of the file lengths?​
While it is booting, the third (9.2.1.2) of the three updates outputs a large number of error messages. I ran it twice before I let the update proceed, but always saw the same messages. I am a Windows user, and all the messages are in FreeBSD jargon, so I can't decipher them, but in the future could you please test the install to avoid terrorizing the users? All my PhD research notes and all my documents are on this NAS. The terror is real.​
 

Yatti420

Wizard
Joined
Aug 12, 2012
Messages
1,437
Specs please.. Please run "camcontrol devlist" and "smartctl -a -q noserial /dev/ada#" and post the output here in code tags.. Compression is enabled by default on new versions I believe.. I'm not sure exactly what updates you did.. These require planning and shouldn't be taken lightly for a production machine.. Is the 3x1tb raidz1? What about the SSD what is it for (jails? / and now with 9.2.1.x they .system dataset and persistent syslog if you desire?)
 

Charles Elliott

Dabbler
Joined
Oct 13, 2013
Messages
37
Thank you for your help. Here are the results from camcontrol devlist:
Code:
<Hitachi HDS721010CLA330 JP40A3MA> at scbus0 target 0 lun 0 (ada0,pass0) data disk
<Hitachi HDS721010CLA330 JP40A3MA> at scbus0 target 0 lun 0 (ada1,pass1) data disk
<Hitachi HDS721010CLA330 JP40A3MA> at scbus0 target 0 lun 0 (ada2,pass2) data disk
<OCZ-VERTEX4 1.5>                  at scbus5 target 0 lun 0 (ada3,pass3) cache device
< Patriot Memory PMAP>            at scbus7 target 0 lun 0 (pass4,da0)  O/S

How do I access the data from the FreeBSD shell? I typed in the above camcontrol devlist data, but that is not feasible with the smartctl data. Commands like camcontrol devlist >t.txt return an error: All directories read only.

I ran smartctl on the first 4 disks in the above list. There are pages of data, but in general it says all 4 disks are OK. The short tests have been run w/o error on the data disks. No tests have been run on the SSD, but it is not clear that the device supports them.

For some reason, the NAS emailed a log this morning, the first I have seen in weeks:

Checking status of zfs pools:

NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT

Mark 2.72T 1.30T 1.42T 47% 1.00x ONLINE /mnt



pool: Mark

state: ONLINE

status: One or more devices has experienced an unrecoverable error. An

attempt was made to correct the error. Applications are unaffected.

action: Determine if the device needs to be replaced, and clear the errors

using 'zpool clear' or replace the device with 'zpool replace'.

see: http://illumos.org/msg/ZFS-8000-9P

scan: scrub repaired 648K in 3h49m with 0 errors on Sun Feb 9 03:49:17 2014

config:



NAME STATE READ WRITE CKSUM

Mark ONLINE 0 0 0

raidz1-0 ONLINE 0 0 0

gptid/71424a08-32c4-11e3-b166-002590a4fcaa ONLINE 0 0 39

gptid/71b48663-32c4-11e3-b166-002590a4fcaa ONLINE 0 0 0

gptid/72242371-32c4-11e3-b166-002590a4fcaa ONLINE 0 0 0

cache

gptid/7264a965-32c4-11e3-b166-002590a4fcaa ONLINE 0 0 0



errors: No known data errors

So, it appears the first data disk has an error. I will run the long smart test on it and see what results. I tried zpool clear, but the command requires more data: usage: clear [-nF] <pool> [device]. I suppose the pool is Mark, but what device name do I use?
Again, thank you for your help and very prompt reply!
 

Yatti420

Wizard
Joined
Aug 12, 2012
Messages
1,437
Ok.. First of all.. The SSD cache device do you really need this? I would repurpose this for jails and the .system dataset and perhaps syslog..Using a cache when not nescessary will decrease performance in all likelyhood..

Also yes based on your email the smart system has seen checksum errors..

This is a problem.. Could be drive related but could be hardware causing it 2.. Do you use ECC ram? Was your ram tested? What about the drives before putting them into production?

From your original post.. Dont worry about the mdns errors..

Please run "smartctl -a -q noserial /dev/ada0" .. then ada 1 etc.. I understand the integrated shell keeps most people confused.. Please ensure SSH is enabled and please use Putty for SSH access.. This will let you scroll and cut/paste easier etc.. Please post the output here in code tags..

Thanks,
 

Yatti420

Wizard
Joined
Aug 12, 2012
Messages
1,437
Have you run a memtest to confirm your ram is good? Checksum errors aren't a good sign.. Please run "smartctl -a -q noserial /dev/ada0" .. then ada 1 etc.. I understand the integrated shell keeps most people confused.. Please ensure SSH is enabled and please use Putty for SSH access.. This will let you scroll and cut/paste easier etc.. Please post the output here in code tags..
 

Yatti420

Wizard
Joined
Aug 12, 2012
Messages
1,437
Maybe check sata cables? what happens if you try a new sata cable? I would have complete backup incase of collapse just in case..
 

warri

Guru
Joined
Jun 6, 2011
Messages
1,193
The first update (9.2.0) turned the daily emailed log into a couple of lines of gibberish. The second made the log disappear; now no log is emailed. The third did not fix the problem. How do I get the log back, i.e., emailed to me with Ethernet statistics (e.g., packets and bytes transferred by interface)? Also, how can I get the security log back, i.e., emailed?


You will now only get an email if there is something wrong with your system (e.g. in your case of checksum errors).
There was a way to revert that, but you'll have to dig up the bug report in the bug tracker and read through the comments.
 

gpsguy

Active Member
Joined
Jan 22, 2012
Messages
4,472
Redirect the output to a location on your pool, where you can retrieve the file using Windows Explorer.

For example: camcontrol devlist > /mnt/mark/xxx/t.txt (where xxx corresponds to where your CIFS share is located).

As Yatti420 said, SSH/PuTTY is the way to go. But, the method above, might get us the requested data faster.

While the "smartctl extended test completed w/o errors ...", we'd like to see the all the output. We'll be looking at the various SMART attributes and the raw data.

Commands like camcontrol devlist >t.txt return an error: All directories read only.

BTW, for important stuff like - PHD notes, family pictures, etc. you can't have too many backups.
 

Charles Elliott

Dabbler
Joined
Oct 13, 2013
Messages
37
Here is the smartctl output in the order ada0, ada1, ada2 (1TB data disks) and ada3 (128 GB SSD):
Code:
smartctl 6.2 2013-07-26 r3841 [FreeBSD 9.2-RELEASE-p3 amd64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
 
=== START OF INFORMATION SECTION ===
Model Family:    Hitachi Deskstar 7K1000.C
Device Model:    Hitachi HDS721010CLA330
Serial Number:    JPS930N11WG9VL
LU WWN Device Id: 5 000cca 39cda9345
Firmware Version: JP4OA3MA
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7200 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:  ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 2.6, 3.0 Gb/s
Local Time is:    Fri Mar 21 22:11:50 2014 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
 
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
 
General SMART Values:
Offline data collection status:  (0x85)    Offline data collection activity
                    was aborted by an interrupting command from host.
                    Auto Offline Data Collection: Enabled.
Self-test execution status:      (  0)    The previous self-test routine completed
                    without error or no self-test has ever
                    been run.
Total time to complete Offline
data collection:        ( 9753) seconds.
Offline data collection
capabilities:              (0x5b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    No Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:      (  1) minutes.
Extended self-test routine
recommended polling time:      ( 163) minutes.
SCT capabilities:            (0x003d)    SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.
 
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate    0x000b  100  100  016    Pre-fail  Always      -      0
  2 Throughput_Performance  0x0005  136  136  054    Pre-fail  Offline      -      95
  3 Spin_Up_Time            0x0007  119  119  024    Pre-fail  Always      -      319 (Average 318)
  4 Start_Stop_Count        0x0012  100  100  000    Old_age  Always      -      117
  5 Reallocated_Sector_Ct  0x0033  100  100  005    Pre-fail  Always      -      0
  7 Seek_Error_Rate        0x000b  100  100  067    Pre-fail  Always      -      0
  8 Seek_Time_Performance  0x0005  140  140  020    Pre-fail  Offline      -      30
  9 Power_On_Hours          0x0012  100  100  000    Old_age  Always      -      4102
10 Spin_Retry_Count        0x0013  100  100  060    Pre-fail  Always      -      0
12 Power_Cycle_Count      0x0032  100  100  000    Old_age  Always      -      117
192 Power-Off_Retract_Count 0x0032  100  100  000    Old_age  Always      -      143
193 Load_Cycle_Count        0x0012  100  100  000    Old_age  Always      -      143
194 Temperature_Celsius    0x0002  176  176  000    Old_age  Always      -      34 (Min/Max 11/45)
196 Reallocated_Event_Count 0x0032  100  100  000    Old_age  Always      -      0
197 Current_Pending_Sector  0x0022  100  100  000    Old_age  Always      -      0
198 Offline_Uncorrectable  0x0008  100  100  000    Old_age  Offline      -      0
199 UDMA_CRC_Error_Count    0x000a  200  200  000    Old_age  Always      -      0
 
SMART Error Log Version: 0
No Errors Logged
 
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error      00%      4096        -
# 2  Short offline      Completed without error      00%      4080        -
# 3  Short offline      Completed without error      00%      3036        -
# 4  Short offline      Completed without error      00%      3015        -
# 5  Short offline      Completed without error      00%      2992        -
# 6  Short offline      Completed without error      00%      2968        -
# 7  Short offline      Completed without error      00%      2944        -
# 8  Short offline      Completed without error      00%      2920        -
# 9  Short offline      Completed without error      00%      2896        -
#10  Short offline      Completed without error      00%      2872        -
#11  Short offline      Completed without error      00%      2848        -
#12  Short offline      Completed without error      00%      2824        -
#13  Short offline      Completed without error      00%      2800        -
#14  Short offline      Completed without error      00%      2776        -
#15  Short offline      Completed without error      00%      2752        -
#16  Short offline      Completed without error      00%      2728        -
#17  Short offline      Completed without error      00%      2704        -
#18  Short offline      Completed without error      00%      2680        -
#19  Short offline      Completed without error      00%      2656        -
#20  Short offline      Completed without error      00%      2632        -
#21  Short offline      Completed without error      00%      2608        -
 
SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Code:
smartctl 6.2 2013-07-26 r3841 [FreeBSD 9.2-RELEASE-p3 amd64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
 
=== START OF INFORMATION SECTION ===
Model Family:    Hitachi Deskstar 7K1000.C
Device Model:    Hitachi HDS721010CLA330
Serial Number:    JPS930N11WNEKL
LU WWN Device Id: 5 000cca 39cdaaa3e
Firmware Version: JP4OA3MA
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7200 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:  ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 2.6, 3.0 Gb/s
Local Time is:    Fri Mar 21 22:16:33 2014 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
 
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
 
General SMART Values:
Offline data collection status:  (0x84)    Offline data collection activity
                    was suspended by an interrupting command from host.
                    Auto Offline Data Collection: Enabled.
Self-test execution status:      (  0)    The previous self-test routine completed
                    without error or no self-test has ever
                    been run.
Total time to complete Offline
data collection:        (10047) seconds.
Offline data collection
capabilities:              (0x5b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    No Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:      (  1) minutes.
Extended self-test routine
recommended polling time:      ( 167) minutes.
SCT capabilities:            (0x003d)    SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.
 
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate    0x000b  100  100  016    Pre-fail  Always      -      65536
  2 Throughput_Performance  0x0005  136  136  054    Pre-fail  Offline      -      95
  3 Spin_Up_Time            0x0007  119  119  024    Pre-fail  Always      -      317 (Average 321)
  4 Start_Stop_Count        0x0012  100  100  000    Old_age  Always      -      119
  5 Reallocated_Sector_Ct  0x0033  100  100  005    Pre-fail  Always      -      0
  7 Seek_Error_Rate        0x000b  100  100  067    Pre-fail  Always      -      0
  8 Seek_Time_Performance  0x0005  138  138  020    Pre-fail  Offline      -      31
  9 Power_On_Hours          0x0012  100  100  000    Old_age  Always      -      4131
10 Spin_Retry_Count        0x0013  100  100  060    Pre-fail  Always      -      0
12 Power_Cycle_Count      0x0032  100  100  000    Old_age  Always      -      118
192 Power-Off_Retract_Count 0x0032  100  100  000    Old_age  Always      -      141
193 Load_Cycle_Count        0x0012  100  100  000    Old_age  Always      -      141
194 Temperature_Celsius    0x0002  193  193  000    Old_age  Always      -      31 (Min/Max 11/52)
196 Reallocated_Event_Count 0x0032  100  100  000    Old_age  Always      -      0
197 Current_Pending_Sector  0x0022  100  100  000    Old_age  Always      -      0
198 Offline_Uncorrectable  0x0008  100  100  000    Old_age  Offline      -      0
199 UDMA_CRC_Error_Count    0x000a  200  200  000    Old_age  Always      -      0
 
SMART Error Log Version: 0
No Errors Logged
 
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline      Completed without error      00%      4110        -
# 2  Short offline      Completed without error      00%      4069        -
# 3  Short offline      Completed without error      00%      4045        -
# 4  Short offline      Completed without error      00%      3967        -
# 5  Short offline      Completed without error      00%      3947        -
# 6  Short offline      Completed without error      00%      3782        -
# 7  Short offline      Completed without error      00%      3639        -
# 8  Short offline      Completed without error      00%      3489        -
# 9  Short offline      Completed without error      00%      3465        -
#10  Short offline      Completed without error      00%      3441        -
#11  Short offline      Completed without error      00%      3417        -
#12  Short offline      Completed without error      00%      3395        -
#13  Short offline      Completed without error      00%      3371        -
#14  Short offline      Completed without error      00%      3347        -
#15  Short offline      Completed without error      00%      3326        -
#16  Short offline      Completed without error      00%      3299        -
#17  Short offline      Completed without error      00%      3275        -
#18  Short offline      Completed without error      00%      3252        -
#19  Short offline      Completed without error      00%      3228        -
#20  Short offline      Completed without error      00%      3186        -
#21  Short offline      Completed without error      00%      3152        -
 
SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Code:
smartctl 6.2 2013-07-26 r3841 [FreeBSD 9.2-RELEASE-p3 amd64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
 
=== START OF INFORMATION SECTION ===
Model Family:    Hitachi Deskstar 7K1000.C
Device Model:    Hitachi HDS721010CLA330
Serial Number:    JPS930N11WNEKL
LU WWN Device Id: 5 000cca 39cdaaa3e
Firmware Version: JP4OA3MA
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7200 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:  ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 2.6, 3.0 Gb/s
Local Time is:    Fri Mar 21 22:16:33 2014 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
 
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
 
General SMART Values:
Offline data collection status:  (0x84)    Offline data collection activity
                    was suspended by an interrupting command from host.
                    Auto Offline Data Collection: Enabled.
Self-test execution status:      (  0)    The previous self-test routine completed
                    without error or no self-test has ever
                    been run.
Total time to complete Offline
data collection:        (10047) seconds.
Offline data collection
capabilities:              (0x5b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    No Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:      (  1) minutes.
Extended self-test routine
recommended polling time:      ( 167) minutes.
SCT capabilities:            (0x003d)    SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.
 
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate    0x000b  100  100  016    Pre-fail  Always      -      65536
  2 Throughput_Performance  0x0005  136  136  054    Pre-fail  Offline      -      95
  3 Spin_Up_Time            0x0007  119  119  024    Pre-fail  Always      -      317 (Average 321)
  4 Start_Stop_Count        0x0012  100  100  000    Old_age  Always      -      119
  5 Reallocated_Sector_Ct  0x0033  100  100  005    Pre-fail  Always      -      0
  7 Seek_Error_Rate        0x000b  100  100  067    Pre-fail  Always      -      0
  8 Seek_Time_Performance  0x0005  138  138  020    Pre-fail  Offline      -      31
  9 Power_On_Hours          0x0012  100  100  000    Old_age  Always      -      4131
10 Spin_Retry_Count        0x0013  100  100  060    Pre-fail  Always      -      0
12 Power_Cycle_Count      0x0032  100  100  000    Old_age  Always      -      118
192 Power-Off_Retract_Count 0x0032  100  100  000    Old_age  Always      -      141
193 Load_Cycle_Count        0x0012  100  100  000    Old_age  Always      -      141
194 Temperature_Celsius    0x0002  193  193  000    Old_age  Always      -      31 (Min/Max 11/52)
196 Reallocated_Event_Count 0x0032  100  100  000    Old_age  Always      -      0
197 Current_Pending_Sector  0x0022  100  100  000    Old_age  Always      -      0
198 Offline_Uncorrectable  0x0008  100  100  000    Old_age  Offline      -      0
199 UDMA_CRC_Error_Count    0x000a  200  200  000    Old_age  Always      -      0
 
SMART Error Log Version: 0
No Errors Logged
 
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline      Completed without error      00%      4110        -
# 2  Short offline      Completed without error      00%      4069        -
# 3  Short offline      Completed without error      00%      4045        -
# 4  Short offline      Completed without error      00%      3967        -
# 5  Short offline      Completed without error      00%      3947        -
# 6  Short offline      Completed without error      00%      3782        -
# 7  Short offline      Completed without error      00%      3639        -
# 8  Short offline      Completed without error      00%      3489        -
# 9  Short offline      Completed without error      00%      3465        -
#10  Short offline      Completed without error      00%      3441        -
#11  Short offline      Completed without error      00%      3417        -
#12  Short offline      Completed without error      00%      3395        -
#13  Short offline      Completed without error      00%      3371        -
#14  Short offline      Completed without error      00%      3347        -
#15  Short offline      Completed without error      00%      3326        -
#16  Short offline      Completed without error      00%      3299        -
#17  Short offline      Completed without error      00%      3275        -
#18  Short offline      Completed without error      00%      3252        -
#19  Short offline      Completed without error      00%      3228        -
#20  Short offline      Completed without error      00%      3186        -
#21  Short offline      Completed without error      00%      3152        -
 
SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Code:
smartctl 6.2 2013-07-26 r3841 [FreeBSD 9.2-RELEASE-p3 amd64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
 
=== START OF INFORMATION SECTION ===
Model Family:    Indilinx Barefoot_2/Everest/Martini based SSDs
Device Model:    OCZ-VERTEX4
Serial Number:    OCZ-7M9QW24M79LQ5K2O
LU WWN Device Id: 5 e83a97 f310cc01c
Firmware Version: 1.5
User Capacity:    128,035,676,160 bytes [128 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Device is:        In smartctl database [for details use: -P show]
ATA Version is:  ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Sun Mar 23 10:24:14 2014 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
 
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
 
General SMART Values:
Offline data collection status:  (0x00)    Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Disabled.
Self-test execution status:      (  0)    The previous self-test routine completed
                    without error or no self-test has ever
                    been run.
Total time to complete Offline
data collection:        (    0) seconds.
Offline data collection
capabilities:              (0x1d) SMART execute Offline immediate.
                    No Auto Offline data collection support.
                    Abort Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    No Conveyance Self-test supported.
                    No Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x00)    Error logging NOT supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:      (  0) minutes.
Extended self-test routine
recommended polling time:      (  0) minutes.
 
SMART Attributes Data Structure revision number: 18
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate    0x0000  006  000  000    Old_age  Offline      -      6
  3 Spin_Up_Time            0x0000  100  100  000    Old_age  Offline      -      0
  4 Start_Stop_Count        0x0000  100  100  000    Old_age  Offline      -      0
  5 Reallocated_Sector_Ct  0x0000  100  100  000    Old_age  Offline      -      0
  9 Power_On_Hours          0x0000  100  100  000    Old_age  Offline      -      7351
12 Power_Cycle_Count      0x0000  100  100  000    Old_age  Offline      -      180
232 Lifetime_Writes        0x0000  100  100  000    Old_age  Offline      -      18591767785
233 Media_Wearout_Indicator 0x0000  099  000  000    Old_age  Offline      -      99
 
SMART Error Log not supported
 
Warning! SMART Self-Test Log Structure error: invalid SMART checksum.
SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]
 
 
Selective Self-tests/Logging not supported


Please note in the last entry above that te SSD does not support logging. So, although I ran a smartctl -t short test, the only result is "SMART overall-health self-assessment test result: PASSED."

I will answer your questions in the next entry.
 

Charles Elliott

Dabbler
Joined
Oct 13, 2013
Messages
37
To answer your questions:
>Ok.. First of all.. The SSD cache device do you really need this?

Strictly speaking, No. The cache is desirable when one writes a lot of data to the NAS and then immeiately re-reads it. I don't do that, though I could; I use the NAS for long term storage of documents and downloads.

>Do you use ECC ram? Was your ram tested?

Yes and yes. And I just re-tested it again this morning. Memtest86+, Ver 5.01 reported the memory characteristics correctly and found no errors.

>What about the drives before putting them into production?

Yes. I wrote a diagnostic in Java that writes all 0's to the entire disk and then reads them back checking for accuracy, all 1's ditto, ..., etc. I ran that test for a week as suggested in the FreeNAS installation isntructions. Then I ran an abbreviated version of the test again for each possible combination of SATA port and hard disk to check for speed differences. There were small but statistically significant speed differences between the SATA ports, but no data errors.


> Maybe check sata cables? what happens if you try a new sata cable?

I checked all the SATA cables. All but one is of the self-locking type and all seem firmly in place. I put the NAS in an old, fairly small case (the PSU has to sit in the bottom). Consequently, it is hard to access the hard disks, so I am reluctant to change the cables.

I orignally installed FreeNAS ver 9.1.1, and the system was perfect and fast. Then I installed 9.2.0, 9.2.1, and 9.2.1.2, a few days apart. While the upgrade to 9.2.0 was proceeding, an error message flashed across the screen that an error was found on one of the disks and that now the system was running in degraded mode. After the upgrade to 9.2.0 was complete, file transfers to and from the NAS, from any computer, took about 10 times as long. Task Manager said the transfer rate was at max ~34 Mbs, whereas under ver 9.1.1 I had regularly seen 340 Mbs.

The system worked well before the 9.2.0 upgrade and was slow after it. None of the subsequent upgrades have impacted the problem. During the 9.2.1.2 upgrade, a few error messages flashed across the screen. Can you blame me for suspecting the upgrade process itself caused the disk errror to appear? The error-explanation page at http://illumos.org/msg/ZFS-8000-9P itself says this problem can occur if an administrator mistakenly writes something to a data disk.

I want to try executing zpool clear <Mark> device num and see if the NAS returns to its pre-upgrade speed. Does anyone think this is a bad idea? What do I use for the device number? Can I use the last entry in the line below (39cda9345), that comes from the SMART results after executing 'smartctl -a /dev/ada0'?
8. LU WWN Device Id: 5 000cca 39cda9345

Or, should I use the gptid that comes from the error message FreeNAS sent me describing the problem?
"gptid/71424a08-32c4-11e3-b166-002590a4fcaa"

Thank you every one for your help.
 

Yatti420

Wizard
Joined
Aug 12, 2012
Messages
1,437
You can attempt to clear but if you feel the performance is suffering there is a problem somewhere that isn't related to the upgrades - if it was you wouldn't of gotten a SMART warning.... The first thing I would do is backup all your data.. Second get rid of the cache device..

Unless you know you need it - you shouldn't have one in the pool.. To be honest in all likelihood it's doing nothing for performance benefit..
 

gpsguy

Active Member
Joined
Jan 22, 2012
Messages
4,472
What version are you on now? If it's in the 9.2.1.x - upgrade to 9.2.1.3. This release has had a number of problems. Check the announcement forum for the scoop on it.

You might consider rolling back to 9.2.0 if you have still have a backup of one your older configuration files.


Sent from my phone
 

gpsguy

Active Member
Joined
Jan 22, 2012
Messages
4,472
Just do: zpool clear Mark

zpool clear <Mark> device num and see if the NAS returns to its pre-upgrade speed.

From what I can see, the SMART data looks good. As Yatti420 said, I'd remove the cache drive. Most systems don't need one and depending on your configuration, could have a negative impact on your performance.

Do read the threads regarding the releases for 9.2.1.1, .2, .3.
 

Charles Elliott

Dabbler
Joined
Oct 13, 2013
Messages
37
I took out the cache device, but there was no change in performance. I am still only seeing about 1.x MBs transfer rate between the NAS and this computer. Between this computer and another one, there is about a 45 MBs transfer rate, which seems to indicate that the Gigabit Ethernet interface is working. The ping time to the NAS is <1 ms on the GB Ethernet and 2 ms on the fast Ethernet interface, so the GB Ethernet is faster. The rooter says connection to the NAS is 1000 Mbs, so it could be OK.
 

gpsguy

Active Member
Joined
Jan 22, 2012
Messages
4,472
did you run the command: zpool clear Mark

Please provide detailed system spec's, ie.
  1. FreeNAS version and platform (32 or 64 bit).
  2. General hardware information (CPU, RAM, Motherboard model, etc.).
  3. Specific hardware information (Network card chipset, Raid controller chipset, etc.).
 

Charles Elliott

Dabbler
Joined
Oct 13, 2013
Messages
37
I do not believe I did run "zpool clear Mark," but I am not positive.

As to performance, pls see my post here: http://forums.freenas.org/index.php?threads/announcing-freenas-9-2-1-3-release.19539/page-4. The short of it is, after smbd restarted itself this morning I am seeing 80-85 MBs (640- 680 Mbs) transfer rates between FreeNAS and this computer, while the numbers I gave above were correct as of a few days ago. I just retested it a few minutes ago, while I was writing the other post.

FreeNAS version and platform: FreeNAS-9.2.1.3-RELEASE-x64.iso installed from CDROM as an update.
General H/W information:
CPU: Intel(R) Xeon(R) CPU 3065 @ 2.33GHz
Memory: 8169MB ECC
Mobo: SuperMicro_X7SBL-LN2
Hard Disks 3 x 1TB Hitachi hard disks
FreeNAS installed on Patriot 14 GB USB 2.0 drive
Not using RAID; using raidz1-0 (ZFS)

After not finding anything wrong with the SSD cache device, I tried to reinstall it on FREENAS, but I could not get zpool to accept it. I would love to reinstall it and see if performance drops. Do you know what commands to use?

Thanks for your help.
 

gpsguy

Active Member
Joined
Jan 22, 2012
Messages
4,472
Glad to hear that your performance is much better now. Since you don't know if your ran the zpool clear command, run: zpool status -v

As we've said before, adding a SSD will probably hamper your peformance. If you feel you must add it, the directions can be found in the manual. Read the "Extend a volume" section.
 

Charles Elliott

Dabbler
Joined
Oct 13, 2013
Messages
37
The results of zpool -v:
Code:
pool: Mark
state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
    still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
    the pool may no longer be accessible by software that does not support
    the features. See zpool-features(7) for details.
  scan: scrub repaired 28K in 1h38m with 0 errors on Sun Mar 23 01:38:32 2014
config:
 
    NAME                                                                                 STATE    READ WRITE CKSUM
    Mark                                                                                   ONLINE      0    0    0
      raidz1-0                                                                            ONLINE      0    0    0
        gptid/71424a08-32c4-11e3-b166-002590a4fcaa  ONLINE      0    0    0
        gptid/71b48663-32c4-11e3-b166-002590a4fcaa  ONLINE      0    0    0
        gptid/72242371-32c4-11e3-b166-002590a4fcaa  ONLINE      0    0    0
 
errors: No known data errors
 
Status
Not open for further replies.
Top