ZFS memory issue

Status
Not open for further replies.

nik.96

Dabbler
Joined
Feb 22, 2015
Messages
14
Hi

I recently just did a fresh install on my server with freenas, previously had freenas with amazing performance however now im having issues with my server now.

Specs:
Intel xeon e3-1220
16gb DDR3 ECC ram
supermicro x10sl7-f (controller has been flashed to IT mode)

Currently the web ui is almost unusable. Its slow and most of the time wont load . Whenever it does work i can see my memory utilisation is low (less than 3gb used). Performance is slow compared to my previous install. Previous install was 2 x 2tb wd green in mirror, was seeing read speed 100mb/s. I have created a mirror with 2x 3tb wd red's, and my read speeds under 30mb/s.

I have tried restarting the server many times, disabled auto-tune. I'm not sure on how to troubleshoot and resolve this issue i'm having.

Any help will be much appreciated.
 

Attachments

  • ScreenClip.png
    ScreenClip.png
    14.7 KB · Views: 271

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
There is absolutely no way with hardware like this that you shouldn't be going at warp speed.

This has to be a network, or hardware, issue. Have you verified the same problem exists from other client systems? Can we see the output for
Code:
smartctl -x /dev/ada0
etc for each drive?
 

nik.96

Dabbler
Joined
Feb 22, 2015
Messages
14
There is absolutely no way with hardware like this that you shouldn't be going at warp speed.

This has to be a network, or hardware, issue. Have you verified the same problem exists from other client systems? Can we see the output for
Code:
smartctl -x /dev/ada0
etc for each drive?

my main pc is connected via gigabit Ethernet to my server, I also tried with my phone over wireless ac and experienced poor speeds.

My previous Setup with freenas saturated my gigabit network and the only things I have changed since then is install 2x 3tb wd reds and fresh install freenas.

Here are the 2x wd red drives. I haven't even configured my other drives since the web ui has been close to unusable.

Code:
=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Red (AF)
Device Model:     WDC WD30EFRX-68EUZN0
Serial Number:    WD-WCC4N2TFUNAT
LU WWN Device Id: 5 0014ee 20c8fe920
Firmware Version: 82.00A82
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Fri Jan 29 03:42:34 2016 AEDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM feature is:   Unavailable
Rd look-ahead is: Enabled
Write cache is:   Enabled
ATA Security is:  Disabled, frozen [SEC2]
Wt Cache Reorder: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (39360) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 395) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x703d) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     POSR-K   100   253   051    -    0
  3 Spin_Up_Time            POS--K   186   186   021    -    5675
  4 Start_Stop_Count        -O--CK   100   100   000    -    11
  5 Reallocated_Sector_Ct   PO--CK   200   200   140    -    0
  7 Seek_Error_Rate         -OSR-K   100   253   000    -    0
  9 Power_On_Hours          -O--CK   100   100   000    -    79
10 Spin_Retry_Count        -O--CK   100   253   000    -    0
11 Calibration_Retry_Count -O--CK   100   253   000    -    0
12 Power_Cycle_Count       -O--CK   100   100   000    -    11
192 Power-Off_Retract_Count -O--CK   200   200   000    -    8
193 Load_Cycle_Count        -O--CK   200   200   000    -    13
194 Temperature_Celsius     -O---K   109   106   000    -    41
196 Reallocated_Event_Count -O--CK   200   200   000    -    0
197 Current_Pending_Sector  -O--CK   200   200   000    -    0
198 Offline_Uncorrectable   ----CK   100   253   000    -    0
199 UDMA_CRC_Error_Count    -O--CK   200   200   000    -    0
200 Multi_Zone_Error_Rate   ---R--   100   253   000    -    0
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x01           SL  R/O      1  Summary SMART error log
0x02           SL  R/O      5  Comprehensive SMART error log
0x03       GPL     R/O      6  Ext. Comprehensive SMART error log
0x06           SL  R/O      1  SMART self-test log
0x07       GPL     R/O      1  Extended self-test log
0x09           SL  R/W      1  Selective self-test log
0x10       GPL     R/O      1  NCQ Command Error log
0x11       GPL     R/O      1  SATA Phy Event Counters
0x21       GPL     R/O      1  Write stream error log
0x22       GPL     R/O      1  Read stream error log
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
0xa0-0xa7  GPL,SL  VS      16  Device vendor specific log
0xa8-0xb7  GPL,SL  VS       1  Device vendor specific log
0xbd       GPL,SL  VS       1  Device vendor specific log
0xc0       GPL,SL  VS       1  Device vendor specific log
0xc1       GPL     VS      93  Device vendor specific log
0xe0       GPL,SL  R/W      1  SCT Command/Status
0xe1       GPL,SL  R/W      1  SCT Data Transfer

SMART Extended Comprehensive Error Log Version: 1 (6 sectors)
No Errors Logged

SMART Extended Self-test Log Version: 1 (1 sectors)
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Status Version:                  3
SCT Version (vendor specific):       258 (0x0102)
SCT Support Level:                   1
Device State:                        Active (0)
Current Temperature:                    41 Celsius
Power Cycle Min/Max Temperature:     39/44 Celsius
Lifetime    Min/Max Temperature:     29/44 Celsius
Under/Over Temperature Limit Count:   0/0

SCT Temperature History Version:     2
Temperature Sampling Period:         1 minute
Temperature Logging Interval:        1 minute
Min/Max recommended Temperature:      0/60 Celsius
Min/Max Temperature Limit:           -41/85 Celsius
Temperature History Size (Index):    478 (15)

Index    Estimated Time   Temperature Celsius
  16    2016-01-28 19:45    41  **********************
...    ..(476 skipped).    ..  **********************
  15    2016-01-29 03:42    41  **********************

SCT Error Recovery Control:
           Read:     70 (7.0 seconds)
          Write:     70 (7.0 seconds)

Device Statistics (GP Log 0x04) not supported

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x0001  2            0  Command failed due to ICRC error
0x0002  2            0  R_ERR response for data FIS
0x0003  2            0  R_ERR response for device-to-host data FIS
0x0004  2            0  R_ERR response for host-to-device data FIS
0x0005  2            0  R_ERR response for non-data FIS
0x0006  2            0  R_ERR response for device-to-host non-data FIS
0x0007  2            0  R_ERR response for host-to-device non-data FIS
0x0008  2            0  Device-to-host non-data FIS retries
0x0009  2           33  Transition from drive PhyRdy to drive PhyNRdy
0x000a  2           34  Device-to-host register FISes sent due to a COMRESET
0x000b  2            0  CRC errors within host-to-device FIS
0x000f  2            0  R_ERR response for host-to-device data FIS, CRC
0x0012  2            0  R_ERR response for host-to-device non-data FIS, CRC
0x8000  4       265461  Vendor specific


Code:
=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Red (AF)
Device Model:     WDC WD30EFRX-68EUZN0
Serial Number:    WD-WCC4N6EP6AZD
LU WWN Device Id: 5 0014ee 20c8fea03
Firmware Version: 82.00A82
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Fri Jan 29 03:43:47 2016 AEDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM feature is:   Unavailable
Rd look-ahead is: Enabled
Write cache is:   Enabled
ATA Security is:  Disabled, frozen [SEC2]
Wt Cache Reorder: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (38100) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 382) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x703d) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     POSR-K   100   253   051    -    0
  3 Spin_Up_Time            POS--K   180   180   021    -    5966
  4 Start_Stop_Count        -O--CK   100   100   000    -    11
  5 Reallocated_Sector_Ct   PO--CK   200   200   140    -    0
  7 Seek_Error_Rate         -OSR-K   100   253   000    -    0
  9 Power_On_Hours          -O--CK   100   100   000    -    79
10 Spin_Retry_Count        -O--CK   100   253   000    -    0
11 Calibration_Retry_Count -O--CK   100   253   000    -    0
12 Power_Cycle_Count       -O--CK   100   100   000    -    11
192 Power-Off_Retract_Count -O--CK   200   200   000    -    8
193 Load_Cycle_Count        -O--CK   200   200   000    -    13
194 Temperature_Celsius     -O---K   112   109   000    -    38
196 Reallocated_Event_Count -O--CK   200   200   000    -    0
197 Current_Pending_Sector  -O--CK   200   200   000    -    0
198 Offline_Uncorrectable   ----CK   100   253   000    -    0
199 UDMA_CRC_Error_Count    -O--CK   200   200   000    -    0
200 Multi_Zone_Error_Rate   ---R--   100   253   000    -    0
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x01           SL  R/O      1  Summary SMART error log
0x02           SL  R/O      5  Comprehensive SMART error log
0x03       GPL     R/O      6  Ext. Comprehensive SMART error log
0x06           SL  R/O      1  SMART self-test log
0x07       GPL     R/O      1  Extended self-test log
0x09           SL  R/W      1  Selective self-test log
0x10       GPL     R/O      1  NCQ Command Error log
0x11       GPL     R/O      1  SATA Phy Event Counters
0x21       GPL     R/O      1  Write stream error log
0x22       GPL     R/O      1  Read stream error log
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
0xa0-0xa7  GPL,SL  VS      16  Device vendor specific log
0xa8-0xb7  GPL,SL  VS       1  Device vendor specific log
0xbd       GPL,SL  VS       1  Device vendor specific log
0xc0       GPL,SL  VS       1  Device vendor specific log
0xc1       GPL     VS      93  Device vendor specific log
0xe0       GPL,SL  R/W      1  SCT Command/Status
0xe1       GPL,SL  R/W      1  SCT Data Transfer

SMART Extended Comprehensive Error Log Version: 1 (6 sectors)
No Errors Logged

SMART Extended Self-test Log Version: 1 (1 sectors)
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Status Version:                  3
SCT Version (vendor specific):       258 (0x0102)
SCT Support Level:                   1
Device State:                        Active (0)
Current Temperature:                    38 Celsius
Power Cycle Min/Max Temperature:     36/41 Celsius
Lifetime    Min/Max Temperature:     28/41 Celsius
Under/Over Temperature Limit Count:   0/0

SCT Temperature History Version:     2
Temperature Sampling Period:         1 minute
Temperature Logging Interval:        1 minute
Min/Max recommended Temperature:      0/60 Celsius
Min/Max Temperature Limit:           -41/85 Celsius
Temperature History Size (Index):    478 (16)

Index    Estimated Time   Temperature Celsius
  17    2016-01-28 19:46    38  *******************
...    ..(182 skipped).    ..  *******************
200    2016-01-28 22:49    38  *******************
201    2016-01-28 22:50    37  ******************
...    ..(201 skipped).    ..  ******************
403    2016-01-29 02:12    37  ******************
404    2016-01-29 02:13    38  *******************
...    ..( 89 skipped).    ..  *******************
  16    2016-01-29 03:43    38  *******************

SCT Error Recovery Control:
           Read:     70 (7.0 seconds)
          Write:     70 (7.0 seconds)

Device Statistics (GP Log 0x04) not supported

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x0001  2            0  Command failed due to ICRC error
0x0002  2            0  R_ERR response for data FIS
0x0003  2            0  R_ERR response for device-to-host data FIS
0x0004  2            0  R_ERR response for host-to-device data FIS
0x0005  2            0  R_ERR response for non-data FIS
0x0006  2            0  R_ERR response for device-to-host non-data FIS
0x0007  2            0  R_ERR response for host-to-device non-data FIS
0x0008  2            0  Device-to-host non-data FIS retries
0x0009  2           33  Transition from drive PhyRdy to drive PhyNRdy
0x000a  2           34  Device-to-host register FISes sent due to a COMRESET
0x000b  2            0  CRC errors within host-to-device FIS
0x000f  2            0  R_ERR response for host-to-device data FIS, CRC
0x0012  2            0  R_ERR response for host-to-device non-data FIS, CRC
0x8000  4       265535  Vendor specific
 

nik.96

Dabbler
Joined
Feb 22, 2015
Messages
14
Some further testing on my part.

Configured 2 x crucial 256gb ssd's in mirror. Write performance to them was peaking at 70mb/s only. Read performance was peaking at 5mb/s.

While doing this testing i noticed my memory usage being weird, screenshot below.
 

Attachments

  • ScreenClip2.png
    ScreenClip2.png
    14.2 KB · Views: 260

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
I mean, all of that looks good. Aside from those drives running a bit warmer than they ought to.

I would suggest completely reinstalling FreeNAS to your boot device---a corrupted boot install would explain this, too. If you are using USB thumb drives for boot, I very strongly recommend that you use *TWO* of them in a mirrored pair at install.
 

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
Some further testing on my part.

Configured 2 x crucial 256gb ssd's in mirror. Write performance to them was peaking at 70mb/s only. Read performance was peaking at 5mb/s.

While doing this testing i noticed my memory usage being weird, screenshot below.
I am now even more strongly leaning towards corrupted install of the FreeNAS software.
 

nik.96

Dabbler
Joined
Feb 22, 2015
Messages
14
I mean, all of that looks good. Aside from those drives running a bit warmer than they ought to.

I would suggest completely reinstalling FreeNAS to your boot device---a corrupted boot install would explain this, too. If you are using USB thumb drives for boot, I very strongly recommend that you use *TWO* of them in a mirrored pair at install.

When i first installed freenas, performance was as expected. Then all of a sudden it just tanked. I can install freenas again just a bit annoying after i just copied all my data onto my server. I'll have to re-install it next week sometime, i'll report back on my results.
 

Jailer

Not strong, but bad
Joined
Sep 12, 2014
Messages
4,977
While doing this testing i noticed my memory usage being weird, screenshot below.

That's not "weird" it's normal. It's your ARC filling up from use. Your memory utilization will always be low after a reboot because the ARC is flushed from RAM. Use it a bit and it will fill back up again as it's supposed to.
 

nik.96

Dabbler
Joined
Feb 22, 2015
Messages
14
That's not "weird" it's normal. It's your ARC filling up from use. Your memory utilization will always be low after a reboot because the ARC is flushed from RAM. Use it a bit and it will fill back up again as it's supposed to.

Sorry weird in the sense that it only filled up after i was testing out my SSD's in mirror. Accessing data on my mirrored reds did nothing to memory usage.
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
When i first installed freenas, performance was as expected. Then all of a sudden it just tanked. I can install freenas again just a bit annoying after i just copied all my data onto my server. I'll have to re-install it next week sometime, i'll report back on my results.
Moving your data to the server doesn't affect your ability to reinstall freenas. All you need to do is install to a new USB or the current one then boot your system and import pool. All your data will be there.
 

Jailer

Not strong, but bad
Joined
Sep 12, 2014
Messages
4,977
Sorry weird in the sense that it only filled up after i was testing out my SSD's in mirror. Accessing data on my mirrored reds did nothing to memory usage.

Then I'm in agreement with DrKK on the fresh install.
 

nik.96

Dabbler
Joined
Feb 22, 2015
Messages
14
okay so i just re-installed freenas, i can say there has been some data loss on my pools. Any data that was written to those drives earlier today is gone (non-important data/backed up data).

First thing i noticed is that writing data onto that wd red mirror is full gigabit :D

However, reading data from that pool is barely reaching over 5mb/s?
 

Jailer

Not strong, but bad
Joined
Sep 12, 2014
Messages
4,977
Re installing FreeNAS has nothing to do with the data on your pool. You obviously have something else going on.

I would start by running a long smart test on each of those drives since none have ever been run.
 

nik.96

Dabbler
Joined
Feb 22, 2015
Messages
14
Re installing FreeNAS has nothing to do with the data on your pool. You obviously have something else going on.

I would start by running a long smart test on each of those drives since none have ever been run.

Looking deeper into the pool it seems like random movies were lost. Files that were written a couple of days ago as well.

I'll perform a long smart test when I wake up. It's 5am in Sydney I need some sleep.
 

nik.96

Dabbler
Joined
Feb 22, 2015
Messages
14
So all the hard drives are fine with the SMART tests. Im not sure what else the problem could be?

It's only read speeds which are slow ~5mb/s, write speeds are maxing out my gigabit lan.
 
Status
Not open for further replies.
Top