Different sized 8TB Disk + can't get wiped

ledieu

Dabbler
Joined
Oct 15, 2016
Messages
40
hello

i recently had a drive fail and wanted to replace it. The drive that died was an Ironwolf 8TB SATA the replacement drive is a Seagate 7E8 8TB SAS drive.

I'm running into a few problems here. According to diskinfo the Exos drive is smaller than the Ironwolf and has a different stripesize. Is that to be expected? This is my first SAS drive I bought it since it was the same price as the SATA Model but if the size is different I guess I can't use it to replace the Ironwolf right?

Also when I want to wipe the drive I get an Error saying
[EFAULT] Command dd if=/dev/zero of=/dev/da3 bs=1m count=32 failed (code 1): dd: /dev/da3: Invalid argument 1+0 records in 0+0 records out 0 bytes transferred in 0.000364 secs (0 bytes/sec)
I get the same result if I try to wipe it in the cli.

Currently both the SATA drives and the SAS drive are connected to the same controller on the same cable. Is that a problem?

Isn't it possible to mix SAS and SATA? Is it normal that the size of the disks vary even though both are advertised as 8TB?

Whats the problem with wiping?

Does SAS offer any benefits over SATA? I though that SAS had some more SMART features and in general was to be preferred over SATA. Is this still true?
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Did you purchase this drive secondhand or decommissioned from an enterprise storage array by any chance?

The "different size" can be bypassed by shortchanging the swap partition size during addition.

Mixing SAS and SATA can potentially cause issues due to signalling voltage mismatch; are they on the same physical breakout cable?

My instinct still says "520-byte sector" or some other T10 protection issue. Can you dump smartctl -a /dev/da3 - feel free to redact serial number, but leave model/firmware untouched please.
 

ledieu

Dabbler
Joined
Oct 15, 2016
Messages
40
its a new drive still sealed.
yes both sata and sas are on the same breakout cable. I'll redo it now but I think you might be right with the sectore size.

here's the output
smartctl -a /dev/da3
smartctl 7.0 2018-12-30 r4883 [FreeBSD 11.3-RELEASE-p14 amd64] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor: SEAGATE
Product: ST8000NM0075
Revision: FE28
Compliance: SPC-3
User Capacity: 7,954,666,695,040 bytes [7.95 TB]
Logical block size: 520 bytes
Physical block size: 4160 bytes
LU is fully provisioned
Rotation Rate: 7200 rpm
Form Factor: 3.5 inches
Logical Unit id: 0x5000c500ca5d4e5f
Serial number:
Device type: disk
Transport protocol: SAS (SPL-3)
Local Time is: Wed Oct 28 13:32:48 2020 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
Temperature Warning: Disabled or Not Supported

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Grown defects during certification <not available>
Total blocks reassigned during format <not available>
Total new blocks reassigned <not available>
Power on minutes since format <not available>
Current Drive Temperature: 36 C
Drive Trip Temperature: 68 C

Manufactured in week 23 of year 2020
Specified cycle count over device lifetime: 50000
Accumulated start-stop cycles: 57
Specified load-unload count over device lifetime: 600000
Accumulated load-unload cycles: 57
Elements in grown defect list: 0

Vendor (Seagate Cache) information
Blocks sent to initiator = 32
Blocks received from initiator = 0
Blocks read from cache and sent to initiator = 10
Number of read and write commands whose size <= segment size = 4
Number of read and write commands whose size > segment size = 0

Vendor (Seagate/Hitachi) factory information
number of hours powered up = 1.43
number of minutes until next internal SMART test = 6

Error counter log:
Errors Corrected by Total Correction Gigabytes Total
ECC rereads/ errors algorithm processed uncorrected
fast | delayed rewrites corrected invocations [10^9 bytes] errors
read: 43530 0 0 43530 0 0.000 0
write: 0 0 0 0 0 0.007 0

Non-medium error count: 0

No Self-tests have been logged

block size=520

here's a sata drive for comparison
smartctl -a /dev/da2
smartctl 7.0 2018-12-30 r4883 [FreeBSD 11.3-RELEASE-p14 amd64] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Seagate IronWolf
Device Model: ST8000VN0022-2EL112
Serial Number:
LU WWN Device Id: 5 000c50 0938ac04a
Firmware Version: SC61
User Capacity: 8,001,563,222,016 bytes [8.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 7200 rpm
Form Factor: 3.5 inches
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-3 T13/2161-D revision 5
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Wed Oct 28 13:37:51 2020 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 575) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 747) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x50bd) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 084 064 044 Pre-fail Always - 235076013
3 Spin_Up_Time 0x0003 084 084 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 105
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 093 060 045 Pre-fail Always - 1855880758
9 Power_On_Hours 0x0032 070 070 000 Old_age Always - 27026 (146 175 0)
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 106
184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 063 051 040 Old_age Always - 37 (Min/Max 26/37)
191 G-Sense_Error_Rate 0x0032 097 097 000 Old_age Always - 6799
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 52
193 Load_Cycle_Count 0x0032 041 041 000 Old_age Always - 118877
194 Temperature_Celsius 0x0022 037 049 000 Old_age Always - 37 (0 18 0 0 0)
195 Hardware_ECC_Recovered 0x001a 006 001 000 Old_age Always - 235076013
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 23051 (13 54 0)
241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 40829934554
242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 335105380492

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 27006 -
# 2 Short offline Completed without error 00% 26859 -
# 3 Short offline Completed without error 00% 26691 -
# 4 Extended offline Completed without error 00% 26631 -
# 5 Short offline Completed without error 00% 26523 -
# 6 Short offline Completed without error 00% 26355 -
# 7 Short offline Completed without error 00% 26187 -
# 8 Short offline Completed without error 00% 26019 -
# 9 Extended offline Completed without error 00% 25911 -
#10 Short offline Completed without error 00% 25853 -
#11 Short offline Completed without error 00% 25683 -
#12 Short offline Completed without error 00% 25515 -
#13 Short offline Completed without error 00% 25347 -
#14 Short offline Completed without error 00% 25179 -
#15 Extended offline Completed without error 00% 25168 -
#16 Short offline Completed without error 00% 25011 -
#17 Short offline Completed without error 00% 24843 -
#18 Short offline Completed without error 00% 24675 -
#19 Short offline Completed without error 00% 24507 -
#20 Extended offline Completed without error 00% 24423 -
#21 Short offline Completed without error 00% 24339 -

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing

according to the site selling the SAS drive should have 512e size aswell apparently thats a wrong information.

I'm putting the SATA drives on their own breakout cable now.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Yep, 520 bytes will do it.

You need to low-level format it to 512 byte sectors. Running this from an SSH session will start the process, but it needs to touch every sector on your drive, so expect this to take hours.

sg_format --format --size=512 --fmtpinfo=0 /dev/da3 -v

There's another command using camcontrol mentioned here:

 

ledieu

Dabbler
Joined
Oct 15, 2016
Messages
40
command is running. By my estimate its gonna end in 4 hours which is way beyong my bedtime today. i'll let it run over night and report back tomorrow.
I found a post mentioning that a reboot is necessary to enact the change. can you confirm?

I've separated sas and sata now on their own breakout cables. I'm planning on moving to a 19" case with backplane. If I do that should I keep the drives on their own cable aswell or does the backplane handle that on its own?

I closed my connection and now I can't see the progress anymore. Any idea on how to get back to it?
 

Mlovelace

Guru
Joined
Aug 19, 2014
Messages
1,111
A bit of additional advice here. Anything you are going to do via SSH that will take an extended period of time, it is worth kicking off tmux before issuing the command. If the SSH session gets disconnected for any reason you can simply reconnect to the tmux session and continue. So, when your SSH session connects simply type 'tmux' and you're good to go.
 

ledieu

Dabbler
Joined
Oct 15, 2016
Messages
40
I did it through the shell on the web. I'm currently in between computers and haven't set up ssh on my laptop yet.

reading up on tmux now though. thanks for the advice. anything is welcome since I'm still learning a lot in the unix and linux world (I'm a windows admin during the day).

apparently i miscalculated the time needed. it's at 40% now and seems to be going slower by the minute. I guess it starts to format the sectors from the outside of the platters towards the inside so the speed starts at the maximum and drops down the further along it gets. is that assumtpion correct?
 
Last edited:

JaimieV

Guru
Joined
Oct 12, 2012
Messages
742
Yes, it's fastest-edge first for formatting. All done okay now?
 

ledieu

Dabbler
Joined
Oct 15, 2016
Messages
40
yep all working fine now. resilver is running and will probably be done in 20-30h.

thanks for the stellar help!

not that i'll ever use it but can you guys tell me whether freenas/truenas supports 520byte sectors (in case somebody would use only 520byte sectors obviously)? My colleague who's in charge of our SAN at work told me netapp uses the larger sectors for another 8bytes of parity (they call it data assurance apparently). Is that something that free/truenas can handle aswell? Is there any benefit in doing so?
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
My colleague who's in charge of our SAN at work told me netapp uses the larger sectors for another 8bytes of parity (they call it data assurance apparently). Is that something that free/truenas can handle aswell? Is there any benefit in doing so?

Also called "Data Integrity Field" or "Protection Information" - it's unnecessary here as ZFS handles its checksums and integrity information separately, without the need for the extra 8 bytes to have support all the way through the hardware/software stack.

Glad to hear it's working for you.
 

ledieu

Dabbler
Joined
Oct 15, 2016
Messages
40
this is my first rebuild on those 8TB drives and I'm only seeing an average of 8,7MiB/s writing speed rebuilding a Z1 of 4 drives. Is that to be expected? I would have thought a rebuild would be more in the range of 20-40 hours. This way it would take a week if i calculated correctly.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Are all of your other SATA drives the same Ironwolf 8T model ST8000VN0022?

Z1 rebuilds can be painful, but a week-long rebuild is the kind of thing I'd expect from an SMR drive ... you don't have any Seagate Archive drives in there, do you?
 

ledieu

Dabbler
Joined
Oct 15, 2016
Messages
40
no all drives besides the exos are ironwolf with CMR.

is there a way to find out what the bottleneck is? should my cpu be under any load? its basically idleing.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
no all drives besides the exos are ironwolf with CMR.

is there a way to find out what the bottleneck is? should my cpu be under any load? its basically idleing.

Check gstat -dp to see if you have any particularly busy or slow disks holding up the process - full system specs wouldn't hurt either, and version of FreeNAS/TrueNAS.

And just in case this is an old pool and/or you've been through a few upgrades:

If on TrueNAS:
zpool get ashift

If on FreeNAS:
zdb -C /data/zfs/zpool.cache | grep ashift

Your ashift should be "0" or "12" for your pool. If you see ashift=9 anywhere you have a problem.
 

ledieu

Dabbler
Joined
Oct 15, 2016
Messages
40
I'm running a Xeon E3 1220 with 32GB of 1600MHz ECC RAM on a Supermicro X9SCA-F. The drives are connected through some kind of LSI 2008 with 8087-SAS break out cables. PSU should be a bequiet 500W Pure Power 11 or 10.

I'm running FreeNAS-11.3-U5.

gstat is showing the same result for all drives except for the new replacement which is at 99% busy basically all the time (some dips down to 70 but it jumps back to 99 quickly). All SATA Drives are working in unison.

If I try zdb -C /data/zfs/zpool.cache its telling me theres no such directory. Where's that file supposed to be? on the boot volume?

The pool itself is pretty old. I think I have it since 9.10, used it on Coral and this new build. It saw multiple hardware changes if thats relevant.

edit:
on the man page I saw /boot/zfs/zpool.cache is that the path I should try out? I'm currently not at the server so I have to rely on my sister to turn on a computer to remote into so I can try out different stuff but I guess I'll give this one a go.

Resilvering is at 25% now after about 37h that would give me about 6 days to resilver the pool.
 
Last edited:

ledieu

Dabbler
Joined
Oct 15, 2016
Messages
40
Since editing doesnt push a thread ill just post it here:
Resilvering is done at all seems to be clear so far. I didnt find a way to check the ashift value. Can you tell me how to do it on 11.3?
 

ledieu

Dabbler
Joined
Oct 15, 2016
Messages
40
any other way i could check it?

maybe somwhere in the gui? should I just try to upgrade it? If yes how would i best go about that? Theres an option over the gui but the way i know freenas its best to do it over the cli.
 

ledieu

Dabbler
Joined
Oct 15, 2016
Messages
40
I found this one that worked
Code:
zdb -U /data/zfs/zpool.cache | grep ashift

both pools returned 12 as ashift values.

should i still try to upgrade the pools? performance is about the same as before. I get between 300 and 450mb/s on reads and writes for 4 8TB 7200rpm drives
 
Top