Confused about one by one disk upgrade. Finally hit that 80% warning.

Nomad

Contributor
Joined
Oct 14, 2013
Messages
125
Hey Everyone,

Been a long while and what can I say nothing has happened, expect I'm running out of space. I've read about being able to pop one disk out and another back in, but I've also read that If you are in raidZ# you can't do this. Super confused. What commands can I throw at this things to help you help me?

1664350795916.png

root@freenas[~]# zpool status ZED3
pool: ZED3
state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
the pool may no longer be accessible by software that does not support
the features. See zpool-features(7) for details.
scan: scrub repaired 0 in 0 days 14:45:20 with 0 errors on Sun Sep 18 14:45:21 2022
config:

NAME STATE READ WRITE CKSUM
ZED3 ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
gptid/2c901587-e639-11e7-96ff-ac1f6b247de2 ONLINE 0 0 0
gptid/2d585572-e639-11e7-96ff-ac1f6b247de2 ONLINE 0 0 0
gptid/2e1e5fa2-e639-11e7-96ff-ac1f6b247de2 ONLINE 0 0 0
gptid/2f31b401-e639-11e7-96ff-ac1f6b247de2 ONLINE 0 0 0
gptid/2ff9cabb-e639-11e7-96ff-ac1f6b247de2 ONLINE 0 0 0
gptid/30c36a01-e639-11e7-96ff-ac1f6b247de2 ONLINE 0 0 0

errors: No known data errors
root@freenas[~]#

Looks like most I can do without a ram upgrade would be 6x12TB drives.

I've got two ram slots open if there is a better $/TB option that makes sense.
 
Last edited:

Nomad

Contributor
Joined
Oct 14, 2013
Messages
125
Signature is super old. Running the following.
OS Version:
FreeNAS-11.2-U8
(Build Date: Feb 14, 2020 15:55)
Processor:
Intel(R) Xeon(R) CPU E3-1225 v6 @ 3.30GHz (4 cores)
Memory:
32 GiB

Handle 0x0002, DMI type 2, 15 bytes
Base Board Information
Manufacturer: Supermicro
Product Name: X11SSM-F
Version: 1.01
Serial Number: VM177S504704
Asset Tag: To be filled by O.E.M.
Features:
Board is a hosting board
Board is replaceable
Location In Chassis: To be filled by O.E.M.
Chassis Handle: 0x0003
Type: Motherboard
Contained Object Handles: 0
 

Nomad

Contributor
Joined
Oct 14, 2013
Messages
125
=== START OF INFORMATION SECTION === Model Family: Seagate BarraCuda 3.5 (SMR) Device Model: ST6000DM003-2CY186 User Capacity: 6,001,175,126,016 bytes [6.00 TB] ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 079 064 006 Pre-fail Always - 77981064 3 Spin_Up_Time 0x0003 092 091 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 57 5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0 7 Seek_Error_Rate 0x000f 096 060 045 Pre-fail Always - 3765958434 9 Power_On_Hours 0x0032 066 066 000 Old_age Always - 30119 (106 230 0) 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 57 183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0 184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0 0 0 189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0 190 Airflow_Temperature_Cel 0x0022 068 060 040 Old_age Always - 32 (Min/Max 29/34) 191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 1160 193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 1481 194 Temperature_Celsius 0x0022 032 040 000 Old_age Always - 32 (0 18 0 0 0) 195 Hardware_ECC_Recovered 0x001a 079 064 000 Old_age Always - 77981064 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 26261h+31m+57.682s 241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 47443896177 242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 205513813622
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 079 064 006 Pre-fail Always - 78903544 3 Spin_Up_Time 0x0003 091 091 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 57 5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0 7 Seek_Error_Rate 0x000f 096 060 045 Pre-fail Always - 4018022987 9 Power_On_Hours 0x0032 066 066 000 Old_age Always - 30119 (156 92 0) 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 57 183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0 184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0 0 0 189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0 190 Airflow_Temperature_Cel 0x0022 067 059 040 Old_age Always - 33 (Min/Max 29/34) 191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 1185 193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 1446 194 Temperature_Celsius 0x0022 033 041 000 Old_age Always - 33 (0 18 0 0 0) 195 Hardware_ECC_Recovered 0x001a 079 064 000 Old_age Always - 78903544 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 26224h+51m+03.759s 241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 47421658872 242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 208188477144
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 080 064 006 Pre-fail Always - 91728312 3 Spin_Up_Time 0x0003 091 091 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 57 5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0 7 Seek_Error_Rate 0x000f 096 060 045 Pre-fail Always - 3679514051 9 Power_On_Hours 0x0032 066 066 000 Old_age Always - 30119 (68 144 0) 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 57 183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0 184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0 0 0 189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0 190 Airflow_Temperature_Cel 0x0022 068 060 040 Old_age Always - 32 (Min/Max 28/33) 191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 1147 193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 1476 194 Temperature_Celsius 0x0022 032 040 000 Old_age Always - 32 (0 17 0 0 0) 195 Hardware_ECC_Recovered 0x001a 080 064 000 Old_age Always - 91728312 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 26316h+26m+18.064s 241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 47400903793 242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 211954429945
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 080 064 006 Pre-fail Always - 97938512 3 Spin_Up_Time 0x0003 092 092 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 59 5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0 7 Seek_Error_Rate 0x000f 096 060 045 Pre-fail Always - 3645013888 9 Power_On_Hours 0x0032 066 066 000 Old_age Always - 30119 (30 65 0) 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 59 183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0 184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0 0 0 189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0 190 Airflow_Temperature_Cel 0x0022 067 061 040 Old_age Always - 33 (Min/Max 29/34) 191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 1151 193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 1443 194 Temperature_Celsius 0x0022 033 040 000 Old_age Always - 33 (0 19 0 0 0) 195 Hardware_ECC_Recovered 0x001a 080 064 000 Old_age Always - 97938512 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 26237h+01m+26.649s 241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 47374752873 242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 212893482857
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 078 064 006 Pre-fail Always - 62107168 3 Spin_Up_Time 0x0003 092 092 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 57 5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0 7 Seek_Error_Rate 0x000f 096 060 045 Pre-fail Always - 3892671486 9 Power_On_Hours 0x0032 066 066 000 Old_age Always - 30119 (90 106 0) 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 57 183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0 184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0 0 0 189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0 190 Airflow_Temperature_Cel 0x0022 067 060 040 Old_age Always - 33 (Min/Max 29/34) 191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 1164 193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 1482 194 Temperature_Celsius 0x0022 033 040 000 Old_age Always - 33 (0 18 0 0 0) 195 Hardware_ECC_Recovered 0x001a 078 064 000 Old_age Always - 62107168 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 26259h+37m+35.674s 241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 47425649280 242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 211159688065
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 080 064 006 Pre-fail Always - 97959392 3 Spin_Up_Time 0x0003 092 092 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 57 5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0 7 Seek_Error_Rate 0x000f 096 060 045 Pre-fail Always - 3706437697 9 Power_On_Hours 0x0032 066 066 000 Old_age Always - 30119 (155 98 0) 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 57 183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0 184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0 0 0 189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0 190 Airflow_Temperature_Cel 0x0022 067 061 040 Old_age Always - 33 (Min/Max 29/34) 191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 1156 193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 1449 194 Temperature_Celsius 0x0022 033 040 000 Old_age Always - 33 (0 18 0 0 0) 195 Hardware_ECC_Recovered 0x001a 080 064 000 Old_age Always - 97959392 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 26235h+33m+51.044s 241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 47474288633 242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 209092909813
=> 40 11721045088 ada0 GPT (5.5T) 40 88 - free - (44K) 128 4194304 1 freebsd-swap (2.0G) 4194432 11716850688 2 freebsd-zfs (5.5T) 11721045120 8 - free - (4.0K) => 40 11721045088 ada1 GPT (5.5T) 40 88 - free - (44K) 128 4194304 1 freebsd-swap (2.0G) 4194432 11716850688 2 freebsd-zfs (5.5T) 11721045120 8 - free - (4.0K) => 40 11721045088 ada2 GPT (5.5T) 40 88 - free - (44K) 128 4194304 1 freebsd-swap (2.0G) 4194432 11716850688 2 freebsd-zfs (5.5T) 11721045120 8 - free - (4.0K) => 40 11721045088 ada3 GPT (5.5T) 40 88 - free - (44K) 128 4194304 1 freebsd-swap (2.0G) 4194432 11716850688 2 freebsd-zfs (5.5T) 11721045120 8 - free - (4.0K) => 40 11721045088 ada4 GPT (5.5T) 40 88 - free - (44K) 128 4194304 1 freebsd-swap (2.0G) 4194432 11716850688 2 freebsd-zfs (5.5T) 11721045120 8 - free - (4.0K) => 40 11721045088 ada5 GPT (5.5T) 40 88 - free - (44K) 128 4194304 1 freebsd-swap (2.0G) 4194432 11716850688 2 freebsd-zfs (5.5T) 11721045120 8 - free - (4.0K) root@freenas[/mnt/ZED3]#
 

Nomad

Contributor
Joined
Oct 14, 2013
Messages
125
Also just noticed these are SMR drives. Guess that proves the bark about SMR is worse than it's bite. Been using it for years without issue. 100MB Cap moving TB's of data at a time. No issues during resilver.

Seagate - 3.5"ST6000DM0036TB BarracudaDM-SMR

Drive Managed, DM-SMR, which is opaque to the OS. This means ZFS cannot "target" writes, and is the worst type for ZFS use. As a rule of thumb, avoid DM-SMR drives, unless you have a specific use case where the increased resilver time (a week or longer) is acceptable, and you know the drive will function for ZFS during resilver.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
but I've also read that If you are in raidZ# you can't do this. Super confused.
No need for confusion, it's complete BS.

You can replace 1 by 1 with no concern in RAIDZ2 or 3... RAIDZ1 just poses an additional risk as you have no redundancy when doing that.

Since you have the additional wrinkle of SMR involved, I would advise you to ensure that you have a tested backup of whatever important data you have on that pool before going ahead and also that you replace SMR with CMR only. (despite what you report as a positive experience, I can only say you've been incredibly lucky somehow).
 

Nomad

Contributor
Joined
Oct 14, 2013
Messages
125
No need for confusion, it's complete BS.

You can replace 1 by 1 with no concern in RAIDZ2 or 3... RAIDZ1 just poses an additional risk as you have no redundancy when doing that.

Since you have the additional wrinkle of SMR involved, I would advise you to ensure that you have a tested backup of whatever important data you have on that pool before going ahead and also that you replace SMR with CMR only. (despite what you report as a positive experience, I can only say you've been incredibly lucky somehow).
Seagate IronWolf 12TB NAS Hard Drive 7200 RPM 256MB Cache SATA 6.0Gb/s CMR 3.5" Internal HDD for RAID Network Attached Storage ST12000VN0008 - OEM
Looking at ordering these. The board I have has 8 SATA ports so instead of having to remove a drive and physically swap them, I should be able to use the extra port and logically swap them right? Plus I thought the issue with SRM was writes and not reads which wouldn't effect a resilver SMR(read)=>CMR(write)
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
Those drives should be fine.

Pushing transactions (read or write) at the drives in the volumes that will occur in a RAIDZ resilver will put those drives under significant stress, so if something were to also push additional writes to them during that time (maybe if your system dataset is on that pool, for example), it may be enough to have the system timing out writes, which then marks the drive as bad even if it's dealing with the reads.

It's not guaranteed to happen, but it can, so be careful/prepared in case it does (and in the worst instance you lose your pool because 3 drives get kicked out).
 

Nomad

Contributor
Joined
Oct 14, 2013
Messages
125
This will be 'offline/off network' during the process.
Thank you for such a quick response.

Do you happen to know which button I push to do a drive swap in the GUI? I haven't seen it anywhere.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
It's under the Pool cogwheel, Status... then the 3 dots next to that disk.

Officially, you offline it, then replace from that menu.

I'm not convinced that you need to offline it first if you have an additional slot. (the docs seem to only cover the scenario where you already have a bad disk)


 
Last edited:

Nomad

Contributor
Joined
Oct 14, 2013
Messages
125
Hmm,

The only option I have is edit :) I'll read over what you posted and see if I can find it.
1664356233226.png
 

Nomad

Contributor
Joined
Oct 14, 2013
Messages
125
Found it! Thanks for the docs... the process starts tonight!
1664356336075.png
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
The only option I have is edit :)
Edited my post to say Pool cogwheel... maybe that was the missing point which sent you in the wrong direction... sorry for that.
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222

Nomad

Contributor
Joined
Oct 14, 2013
Messages
125

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
Thank you. I also noticed you switched compression off on almost all datasets and I am curios about It. Afaik It shouldn't cause any issues even with incompressable data like video files.
 

Nomad

Contributor
Joined
Oct 14, 2013
Messages
125
Thank you. I also noticed you switched compression off on almost all datasets and I am curios about It. Afaik It shouldn't cause any issues even with incompressable data like video files.
Came that way. Maybe because I'm on a very old build of freenas which was upgraded from 8 or 9?
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
I'm not convinced that you need to offline it first if you have an additional slot.
You don't. It's perfectly valid to replace a disk without offlining it first--that keeps the disk online and doesn't degrade redundancy, and ZFS will offline the original disk when the resilvering completes. Though I've seen reports here that the resilvering goes more slowly in that situation than if you'd taken the original disk offline--I haven't compared it myself.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,949
You don't. It's perfectly valid to replace a disk without offlining it first--that keeps the disk online and doesn't degrade redundancy, and ZFS will offline the original disk when the resilvering completes. Though I've seen reports here that the resilvering goes more slowly in that situation than if you'd taken the original disk offline--I haven't compared it myself.
Given the SMR nature of the drives - that might be a good thing - the slow resilver
 

Nomad

Contributor
Joined
Oct 14, 2013
Messages
125
Given the SMR nature of the drives - that might be a good thing - the slow resilver
Again, never had any issues writing TB of data at full speed with these drives and the resilver will be to a CMR drive. SMR(Read)=>CMR(Write) and there are no issues with SMR reading.
 
Top