ZFS Logical Sector Size

  • Thread starter Hubris Won't Be Tolerated
  • Start date
Status
Not open for further replies.
H

Hubris Won't Be Tolerated

Guest
Is there a sector size smaller than 4k that can be used with ZFS without a noticeable impact on r/w performance? While a 4k logical sector to match the physical sector provides the best performance, doing so compromises storage if you're storing many small files. I'm looking for a compromise between the two, where an exorbitant amount of storage space isn't lost due to smaller files.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Not really. The default minimum allocatable size is 4k in all versions released in the last 18 months. This can't really be changed either. Unless you planned to go back to an 8.x release or something, not really.

One thing to keep in mind is that even with small files that are <4KB, you not only have to deal with the minimum allocatable size, but you have to deal with smaller block sizes. This can result in a multiplication effect on the amount of disk space required. If memory serves me right, you need something like 300 bytes of metadata for every block written. So if your blocks are in groups of 512 bytes, that means you're needing 800 or so bytes for every 512 bytes of file data (or an increase of over 50% of the disk space required).

I do know of someone that has more than 4 billion (with a B) files. They haven't really had a problem with their storage needs. So I tend to think that you are over-analysing the need for a smaller block size for your situation. ;)
 
H

Hubris Won't Be Tolerated

Guest
That makes sense, thanks =]

So after smartctl -a /dev/ada2


Code:
smartctl 6.3 2014-07-26 r3976 [FreeBSD 9.3-RELEASE-p13 amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     ST5000DM000-1FK178
Serial Number:    xxxxxxxx
LU WWN Device Id: 5 000c50 07c492cd3
Firmware Version: CC44
User Capacity:    5,000,981,078,016 bytes [5.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5980 rpm
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-2, ACS-3 T13/2161-D revision 3b
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Wed Apr 29 21:01:04 2015 CDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (    0) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 617) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x3035) SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   118   100   006    Pre-fail  Always       -       191265736
  3 Spin_Up_Time            0x0003   092   092   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       36
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   075   060   030    Pre-fail  Always       -       8664708073
  9 Power_On_Hours          0x0032   099   099   000    Old_age   Always       -       1357
10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       36
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   065   062   045    Old_age   Always       -       35 (Min/Max 24/38)
191 G-Sense_Error_Rate      0x0032   052   052   000    Old_age   Always       -       97176
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       19
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       303
194 Temperature_Celsius     0x0022   035   040   000    Old_age   Always       -       35 (0 19 0 0 0)
195 Hardware_ECC_Recovered  0x001a   118   100   000    Old_age   Always       -       191265736
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       1303 (48 87 0)
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       2980915418
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       11220709346

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.



FreeNAS appears to be pulling the logical sector size of 512 and using it as the physical sector size if done via the gui (fdisk reports a 512 sector size for all HDDs used for storage (All are AF HDDs with physical sector sizes of 4k; OS is on a separate SSD, and FreeNAS partitioned it with 512 logical and 512 physical UFS partition) [I've never utilized an SSD HDD prior to this, so perhaps a 512 sector size in SSDs are normal?])

I've inserted vfs.zfs.min_auto_ashift=12 into /conf/base/etc/sysctl.conf, rebooted the server, and tried again with the exact same result - a 512 sector size. How do I force FreeNAS to create sectors that are 4k in size?
 
Last edited by a moderator:

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
You are missing out on the fact that all the recent releases of FreeNAS already force an ashift of 12. That is the default and it is not dynamic. So unless you have evidence that this has changed (if it has nobody told me, but I don't get "memos" for everything) then the fact you have an ashift value of 12 already means you cannot allocate less than 4KB no matter what.
 
H

Hubris Won't Be Tolerated

Guest
running zdb | grep ashift returns a value of 9 and
running fdisk /dev/ada2 ada3 ada4 ada5 returns a media sector size of 512.

Doesn't that mean FreeNAS is utilizing a 512 sector size, or am I confused about/misconstruing what the output means?
 
Last edited by a moderator:

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
It does mean its using a sector size of 512 for data. But that means someone has changed it.. which is probably a really bad idea as you cannot change the ashift value later on, and the likelihood of you being able to buy 512-bytes/sector drives in the future is just about 0%.

I will definitely have to do some research of my own on this...
 
H

Hubris Won't Be Tolerated

Guest
After some additional investigating, the reason for the ashift value of 9 is because of the SSD the OS is installed on.

I destroyed all zfs pools and just realized when I ran zdb | grep ashift it was pulling the ashift value of the only valid disk, the OS's SSD. Are SSD's supposed to have 512 sectors by default? This is my first experience with an SSD (Samsung 850 evo 128GB).

I'm re-creating my zpools now and will post an updated zdb pull once I have them created.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Aah. Ok. That might explain things a bit. Really had me wondering if we had some crappy regression or what was going on.

SSDs work best with a 512byte/sector block, yes. While you could normally use 4k without a problem on an SSD, there's really no reason to go with 4k (except in highly optimized situations, such as a database that always writes exactly 4KB of data) and several reasons why 512-bytes/sector are better.

I would like to see what your zdb output is once it is done. For the record, the best way to check your zpool's ashift value is to do zdb -C <zpoolname> | grep ashift.
 
H

Hubris Won't Be Tolerated

Guest
Code:
~# zdb -C NAS | grep ashift
zdb: can't open 'NAS': No such file or directory


Code:
~# zpool list
NAME           SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
NAS           10.9T   720K  10.9T         -     0%     0%  1.00x  ONLINE  /mnt
freenas-boot   111G  2.09G   109G         -      -     1%  1.00x  ONLINE  -


^ ...Not sure if I'm doing something wrong... ^ This morning I tried setting up raidz using cli, as I'd like to have the 8GB of swap on the 7200rpm disk (ada5), vs spread across the other 3 5900rpm drives, however I thought I did something wrong after not being able to grab the ashift with the command you mentioned I should use. I then destroyed the pool and set it up via the gui (manual, as the disks are different sizes... I know, not the best way to do a zpool), and the same error occured... so I think I'm missing something obvious

Code:
 ~# gpart list ada2 ada3 ada4 ada5
Geom name: ada2
modified: false
state: OK
fwheads: 16
fwsectors: 63
last: 9767541134
first: 34
entries: 128
scheme: GPT
Providers:
1. Name: ada2p1
   Mediasize: 2147483648 (2.0G)
   Sectorsize: 512
   Stripesize: 4096
   Stripeoffset: 0
   Mode: r1w1e1
   rawuuid: 61546cb4-ef82-11e4-99ab-d05099384742
   rawtype: 516e7cb5-6ecf-11d6-8ff8-00022d09712b
   label: 1
   length: 2147483648
   offset: 65536
   type: freebsd-swap
   index: 1
   end: 4194431
   start: 128
2. Name: ada2p2
   Mediasize: 4998833508352 (4.6T)
   Sectorsize: 512
   Stripesize: 4096
   Stripeoffset: 0
   Mode: r1w1e2
   rawuuid: 61661b8d-ef82-11e4-99ab-d05099384742
   rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b
   label: 1
   length: 4998833508352
   offset: 2147549184
   type: freebsd-zfs
   index: 2
   end: 9767541127
   start: 4194432
Consumers:
1. Name: ada2
   Mediasize: 5000981078016 (4.6T)
   Sectorsize: 512
   Stripesize: 4096
   Stripeoffset: 0
   Mode: r2w2e5

Geom name: ada3
modified: false
state: OK
fwheads: 16
fwsectors: 63
last: 9767541134
first: 34
entries: 128
scheme: GPT
Providers:
1. Name: ada3p1
   Mediasize: 2147483648 (2.0G)
   Sectorsize: 512
   Stripesize: 4096
   Stripeoffset: 0
   Mode: r1w1e1
   rawuuid: 61b668ca-ef82-11e4-99ab-d05099384742
   rawtype: 516e7cb5-6ecf-11d6-8ff8-00022d09712b
   label: 1
   length: 2147483648
   offset: 65536
   type: freebsd-swap
   index: 1
   end: 4194431
   start: 128
2. Name: ada3p2
   Mediasize: 4998833508352 (4.6T)
   Sectorsize: 512
   Stripesize: 4096
   Stripeoffset: 0
   Mode: r1w1e2
   rawuuid: 61c69af9-ef82-11e4-99ab-d05099384742
   rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b
   label: 1
   length: 4998833508352
   offset: 2147549184
   type: freebsd-zfs
   index: 2
   end: 9767541127
   start: 4194432
Consumers:
1. Name: ada3
   Mediasize: 5000981078016 (4.6T)
   Sectorsize: 512
   Stripesize: 4096
   Stripeoffset: 0
   Mode: r2w2e5

Geom name: ada4
modified: false
state: OK
fwheads: 16
fwsectors: 63
last: 7814037134
first: 34
entries: 128
scheme: GPT
Providers:
1. Name: ada4p1
   Mediasize: 2147483648 (2.0G)
   Sectorsize: 512
   Stripesize: 4096
   Stripeoffset: 0
   Mode: r1w1e1
   rawuuid: 621273dd-ef82-11e4-99ab-d05099384742
   rawtype: 516e7cb5-6ecf-11d6-8ff8-00022d09712b
   label: 1
   length: 2147483648
   offset: 65536
   type: freebsd-swap
   index: 1
   end: 4194431
   start: 128
2. Name: ada4p2
   Mediasize: 3998639460352 (3.7T)
   Sectorsize: 512
   Stripesize: 4096
   Stripeoffset: 0
   Mode: r1w1e2
   rawuuid: 6226aac3-ef82-11e4-99ab-d05099384742
   rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b
   label: 1
   length: 3998639460352
   offset: 2147549184
   type: freebsd-zfs
   index: 2
   end: 7814037127
   start: 4194432
Consumers:
1. Name: ada4
   Mediasize: 4000787030016 (3.7T)
   Sectorsize: 512
   Stripesize: 4096
   Stripeoffset: 0
   Mode: r2w2e5

Geom name: ada5
modified: false
state: OK
fwheads: 16
fwsectors: 63
last: 5860533134
first: 34
entries: 128
scheme: GPT
Providers:
1. Name: ada5p1
   Mediasize: 2147483648 (2.0G)
   Sectorsize: 512
   Stripesize: 4096
   Stripeoffset: 0
   Mode: r1w1e1
   rawuuid: 627f410b-ef82-11e4-99ab-d05099384742
   rawtype: 516e7cb5-6ecf-11d6-8ff8-00022d09712b
   label: 1
   length: 2147483648
   offset: 65536
   type: freebsd-swap
   index: 1
   end: 4194431
   start: 128
2. Name: ada5p2
   Mediasize: 2998445412352 (2.7T)
   Sectorsize: 512
   Stripesize: 4096
   Stripeoffset: 0
   Mode: r1w1e2
   rawuuid: 6293fda7-ef82-11e4-99ab-d05099384742
   rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b
   label: 1
   length: 2998445412352
   offset: 2147549184
   type: freebsd-zfs
   index: 2
   end: 5860533127
   start: 4194432
Consumers:
1. Name: ada5
   Mediasize: 3000592982016 (2.7T)
   Sectorsize: 512
   Stripesize: 4096
   Stripeoffset: 0
   Mode: r2w2e5


Code:
~# fdisk /dev/ada2
******* Working on device /dev/ada2 *******
parameters extracted from in-core disklabel are:
cylinders=9690021 heads=16 sectors/track=63 (1008 blks/cyl)

Figures below won't work with BIOS for partitions not in cyl 1
parameters to be used for BIOS calculations are:
cylinders=9690021 heads=16 sectors/track=63 (1008 blks/cyl)

Media sector size is 512
Warning: BIOS sector numbering starts with sector 1
Information from DOS bootblock is:
The data for partition 1 is:
sysid 238 (0xee),(EFI GPT)
    start 1, size 4294967295 (2097151 Meg), flag 0
        beg: cyl 0/ head 0/ sector 2;
        end: cyl 1023/ head 255/ sector 63
The data for partition 2 is:
<UNUSED>
The data for partition 3 is:
<UNUSED>
The data for partition 4 is:
<UNUSED>

~# fdisk /dev/ada3
******* Working on device /dev/ada3 *******
parameters extracted from in-core disklabel are:
cylinders=9690021 heads=16 sectors/track=63 (1008 blks/cyl)

Figures below won't work with BIOS for partitions not in cyl 1
parameters to be used for BIOS calculations are:
cylinders=9690021 heads=16 sectors/track=63 (1008 blks/cyl)

Media sector size is 512
Warning: BIOS sector numbering starts with sector 1
Information from DOS bootblock is:
The data for partition 1 is:
sysid 238 (0xee),(EFI GPT)
    start 1, size 4294967295 (2097151 Meg), flag 0
        beg: cyl 0/ head 0/ sector 2;
        end: cyl 1023/ head 255/ sector 63
The data for partition 2 is:
<UNUSED>
The data for partition 3 is:
<UNUSED>
The data for partition 4 is:
<UNUSED>

~# fdisk /dev/ada4
******* Working on device /dev/ada4 *******
parameters extracted from in-core disklabel are:
cylinders=7752021 heads=16 sectors/track=63 (1008 blks/cyl)

Figures below won't work with BIOS for partitions not in cyl 1
parameters to be used for BIOS calculations are:
cylinders=7752021 heads=16 sectors/track=63 (1008 blks/cyl)

Media sector size is 512
Warning: BIOS sector numbering starts with sector 1
Information from DOS bootblock is:
The data for partition 1 is:
sysid 238 (0xee),(EFI GPT)
    start 1, size 4294967295 (2097151 Meg), flag 0
        beg: cyl 0/ head 0/ sector 2;
        end: cyl 1023/ head 255/ sector 63
The data for partition 2 is:
<UNUSED>
The data for partition 3 is:
<UNUSED>
The data for partition 4 is:
<UNUSED>

~# fdisk /dev/ada5
******* Working on device /dev/ada5 *******
parameters extracted from in-core disklabel are:
cylinders=5814021 heads=16 sectors/track=63 (1008 blks/cyl)

Figures below won't work with BIOS for partitions not in cyl 1
parameters to be used for BIOS calculations are:
cylinders=5814021 heads=16 sectors/track=63 (1008 blks/cyl)

Media sector size is 512
Warning: BIOS sector numbering starts with sector 1
Information from DOS bootblock is:
The data for partition 1 is:
sysid 238 (0xee),(EFI GPT)
    start 1, size 4294967295 (2097151 Meg), flag 0
        beg: cyl 0/ head 0/ sector 2;
        end: cyl 1023/ head 255/ sector 63
The data for partition 2 is:
<UNUSED>
The data for partition 3 is:
<UNUSED>
The data for partition 4 is:
<UNUSED>
 
Last edited by a moderator:

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
No clue why zdb -C <poolname> doesn't work. I just tested it on my FreeNAS machine with the current release and it worked just fine. Maybe because your zpool has only 3 letters?
 
H

Hubris Won't Be Tolerated

Guest
Is it recommended to use more than 3? If so, what's the minimum I should use?
 
H

Hubris Won't Be Tolerated

Guest
I've destroyed and re-created the zpool with the following names, each one returning the same "zdb: can't open '<zpoolname>': No such file or directory":
  1. NAS
  2. NAS.pool1
  3. nas-storage
Is there any type of troubleshooting I can perform to determine the issue, as I'd prefer to not have to reinstall FreeNAS if possible

EDIT: A reboot also results with the same error, however it does return the correct output if i query the install zfs: freenas-boot
 
Last edited by a moderator:
H

Hubris Won't Be Tolerated

Guest
okay... so not sure what was causing the error, but rolling back to a prior snapshot solved the issue and the zpool returns an ashift value of 12
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
H

Hubris Won't Be Tolerated

Guest
I did run the verify option under System - Update and it did return 5 files with mismatching hashes, 3 of which I had edited to add values (none that should have resulted in that type of error from zdb, most were mainly firewall values) but two were log files, syslog and another system log. While it is possible I typo-ed a value in one of the three files (rc.conf, rc.firewall, and sysctl.conf) I added values to, I haven't noticed any other odd behavior. Hopefully I can recreate the error to try and narrow down what was causing it.
 
Status
Not open for further replies.
Top