Spares not being utilized as expected

Quietus

Cadet
Joined
Jul 7, 2020
Messages
8
After 10+ years of using FreeNAS and other variants with great results, I finally can't find an answer for this one. Last Sunday the NAS reset itself via a watchdog timer and then it happened three more times today. I could easily replicate the result by initiating a large data transfer. The first error I caught was a fatal trap, then a page fault, then finally a panic to a pool with a vdev guid that I could chase down. While this was happening I also noticed that /dev/da31 was throwing SMART errors to the point it exceeded the threshold. My question is, why didn't one of my three spares kick in? And if I take /dev/da31 offline, then I would definitely expect one of them to kick in automatically, but they just sit there idle. Here's what's in the box:

Chassis: SSG-6048R-E1CR36N
CPU: Dual Xeon E5-2603v4
RAM: 128GB 2400MHz DDR4 ECC
SAS: AOC-S3108L-H8IR
Boot Pool: 2 x 64GB SATA DOM
RAIDZ1
vdev1: 11 x 12TB Exos SAS3
vdev2: 11 x 12TB Exos SAS3
vdev3: 11 x 12TB Exos SAS3
log: 280GB Optane 900P PCie
cache: 2 x 250GB 970 EVO Plus PCIe
spare: 3 x 12TB Exos SAS3

Everything with this build has been great, but if I'd known that the spares wouldn't act like hot spares, I would have gone with 4 x 9 disks vdevs. When clients spec systems with hardware RAID, the hot spares kick in automatically. Don't get me wrong, I'm not advocating hardware RAID...when I have a choice I always use ZFS. But, as I get older, I like having hot spares so I can postpone trips to the data center. Now it appears that I will have to de-allocate one of the spares and use it to manually replace /dev/da31. What am I missing here? Thanks in advance!
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
Please show the output of zpool status -v <name of your RAIDZ1 pool>.
 

Quietus

Cadet
Joined
Jul 7, 2020
Messages
8
Here you go. I manually placed /dev/da31 offline to keep the system from resetting itself, and then removed /dev/da35 from spares so I could manually swap them later.

pool: RaidZ
state: DEGRADED
status: One or more devices has been taken offline by the administrator.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Online the device using 'zpool online' or replace the device with
'zpool replace'.
scan: scrub repaired 0 in 0 days 17:16:55 with 0 errors on Sun Jun 7 17:16:58 2020
config:

NAME STATE READ WRITE CKSUM
RaidZ DEGRADED 0 0 0
raidz1-0 ONLINE 0 0 0
gptid/6a240e8a-07bd-11ea-86a4-ac1f6baba4d8 ONLINE 0 0 0
gptid/6b3106e3-07bd-11ea-86a4-ac1f6baba4d8 ONLINE 0 0 0
gptid/6c41da4f-07bd-11ea-86a4-ac1f6baba4d8 ONLINE 0 0 0
gptid/6d59e735-07bd-11ea-86a4-ac1f6baba4d8 ONLINE 0 0 0
gptid/6f934cb7-07bd-11ea-86a4-ac1f6baba4d8 ONLINE 0 0 0
gptid/70a9a4ab-07bd-11ea-86a4-ac1f6baba4d8 ONLINE 0 0 0
gptid/71bc6e16-07bd-11ea-86a4-ac1f6baba4d8 ONLINE 0 0 0
gptid/73fe9b9e-07bd-11ea-86a4-ac1f6baba4d8 ONLINE 0 0 0
gptid/753028fb-07bd-11ea-86a4-ac1f6baba4d8 ONLINE 0 0 0
gptid/78a5f017-07bd-11ea-86a4-ac1f6baba4d8 ONLINE 0 0 0
gptid/7aef8056-07bd-11ea-86a4-ac1f6baba4d8 ONLINE 0 0 0
raidz1-1 ONLINE 0 0 0
gptid/7c17efb7-07bd-11ea-86a4-ac1f6baba4d8 ONLINE 0 0 0
gptid/7d3e3494-07bd-11ea-86a4-ac1f6baba4d8 ONLINE 0 0 0
gptid/7e60c954-07bd-11ea-86a4-ac1f6baba4d8 ONLINE 0 0 0
gptid/7f826c51-07bd-11ea-86a4-ac1f6baba4d8 ONLINE 0 0 0
gptid/80cfeb7d-07bd-11ea-86a4-ac1f6baba4d8 ONLINE 0 0 0
gptid/81f92de2-07bd-11ea-86a4-ac1f6baba4d8 ONLINE 0 0 0
gptid/83249a69-07bd-11ea-86a4-ac1f6baba4d8 ONLINE 0 0 0
gptid/85762b68-07bd-11ea-86a4-ac1f6baba4d8 ONLINE 0 0 0
gptid/86a37df0-07bd-11ea-86a4-ac1f6baba4d8 ONLINE 0 0 0
gptid/87e25dce-07bd-11ea-86a4-ac1f6baba4d8 ONLINE 0 0 0
gptid/890a9fc6-07bd-11ea-86a4-ac1f6baba4d8 ONLINE 0 0 0
raidz1-2 DEGRADED 0 0 0
gptid/8b8bd8c7-07bd-11ea-86a4-ac1f6baba4d8 ONLINE 0 0 0
gptid/8cc5a48c-07bd-11ea-86a4-ac1f6baba4d8 ONLINE 0 0 0
gptid/8df54f9c-07bd-11ea-86a4-ac1f6baba4d8 ONLINE 0 0 0
gptid/8f3b7ab8-07bd-11ea-86a4-ac1f6baba4d8 ONLINE 0 0 0
gptid/90775826-07bd-11ea-86a4-ac1f6baba4d8 ONLINE 0 0 0
gptid/91add397-07bd-11ea-86a4-ac1f6baba4d8 ONLINE 0 0 0
gptid/92ea1c7f-07bd-11ea-86a4-ac1f6baba4d8 ONLINE 0 0 0
gptid/942f253c-07bd-11ea-86a4-ac1f6baba4d8 ONLINE 0 0 0
gptid/9574cd42-07bd-11ea-86a4-ac1f6baba4d8 ONLINE 0 0 0
12895587042125083494 OFFLINE 0 0 0 was /dev/gptid/97de13dd-07bd-11ea-86a4-ac1f6baba4d8
gptid/9928717c-07bd-11ea-86a4-ac1f6baba4d8 ONLINE 0 0 0
logs
gptid/9ad2749f-07bd-11ea-86a4-ac1f6baba4d8 ONLINE 0 0 0
cache
gptid/99fb4315-07bd-11ea-86a4-ac1f6baba4d8 ONLINE 0 0 0
gptid/9a51c69c-07bd-11ea-86a4-ac1f6baba4d8 ONLINE 0 0 0
spares
gptid/9d28eff8-07bd-11ea-86a4-ac1f6baba4d8 AVAIL
gptid/9e6c1cda-07bd-11ea-86a4-ac1f6baba4d8 AVAIL

errors: No known data errors
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
It's because you defined the spares at the pool level instead of the vdev level. At the pool level, the spare wouldn't become active until an entire vdev failed.
 

Quietus

Cadet
Joined
Jul 7, 2020
Messages
8
That makes them completely useless and me an idiot. Is there any way for me to move them into the individual vdevs now?
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
Not without destroying and rebuilding your pool, sorry.
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
My apologies, my understanding of spares was mistaken. It's not possible to define a spare at the vdev level. Is your SAS card in IT mode or RAID mode?
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
Also, check if you have consistent ashift values across your pool: zdb -U /data/zfs/zpool.cache
 

Quietus

Cadet
Joined
Jul 7, 2020
Messages
8
Sorry, that was a typo, it's a AOC-S3008L-H8IR. The 3108 card was on a shelf with some other RAID cards. This system wasn't supposed to be a FreeNAS box originally, glad the client saw the light. Here's the output from what you mentioned, I don't understand what you are looking for.

RaidZ:
version: 5000
name: 'RaidZ'
state: 0
txg: 7021176
pool_guid: 535622277423987795
hostid: 2817735488
hostname: 'silicon.socrates.work'
com.delphix:has_per_vdev_zaps
vdev_children: 4
vdev_tree:
type: 'root'
id: 0
guid: 535622277423987795
create_txg: 4
children[0]:
type: 'raidz'
id: 0
guid: 2489180138716287880
nparity: 1
metaslab_array: 86
metaslab_shift: 40
ashift: 12
asize: 131977847767040
is_log: 0
create_txg: 4
com.delphix:vdev_zap_top: 36
children[0]:
type: 'disk'
id: 0
guid: 12340368468327919874
path: '/dev/gptid/6a240e8a-07bd-11ea-86a4-ac1f6baba4d8'
phys_path: 'id1,enc@n5003048017e619fd/type@0/slot@1/elmdesc@Slot00/p2'
whole_disk: 1
DTL: 529
create_txg: 4
com.delphix:vdev_zap_leaf: 37
children[1]:
type: 'disk'
id: 1
guid: 14329535964405297133
path: '/dev/gptid/6b3106e3-07bd-11ea-86a4-ac1f6baba4d8'
phys_path: 'id1,enc@n5003048017e619fd/type@0/slot@2/elmdesc@Slot01/p2'
whole_disk: 1
DTL: 528
create_txg: 4
com.delphix:vdev_zap_leaf: 38
children[2]:
type: 'disk'
id: 2
guid: 11038095293395045464
path: '/dev/gptid/6c41da4f-07bd-11ea-86a4-ac1f6baba4d8'
phys_path: 'id1,enc@n5003048017e619fd/type@0/slot@3/elmdesc@Slot02/p2'
whole_disk: 1
DTL: 527
create_txg: 4
com.delphix:vdev_zap_leaf: 39
children[3]:
type: 'disk'
id: 3
guid: 2383967573139848296
path: '/dev/gptid/6d59e735-07bd-11ea-86a4-ac1f6baba4d8'
phys_path: 'id1,enc@n5003048017e619fd/type@0/slot@4/elmdesc@Slot03/p2'
whole_disk: 1
DTL: 526
create_txg: 4
com.delphix:vdev_zap_leaf: 40
children[4]:
type: 'disk'
id: 4
guid: 15567902675321512748
path: '/dev/gptid/6f934cb7-07bd-11ea-86a4-ac1f6baba4d8'
phys_path: 'id1,enc@n5003048017e619fd/type@0/slot@5/elmdesc@Slot04/p2'
whole_disk: 1
DTL: 525
create_txg: 4
com.delphix:vdev_zap_leaf: 41
children[5]:
type: 'disk'
id: 5
guid: 17929624742962370388
path: '/dev/gptid/70a9a4ab-07bd-11ea-86a4-ac1f6baba4d8'
phys_path: 'id1,enc@n5003048017e619fd/type@0/slot@6/elmdesc@Slot05/p2'
whole_disk: 1
DTL: 524
create_txg: 4
com.delphix:vdev_zap_leaf: 42
children[6]:
type: 'disk'
id: 6
guid: 10230412337035885469
path: '/dev/gptid/71bc6e16-07bd-11ea-86a4-ac1f6baba4d8'
phys_path: 'id1,enc@n5003048017e619fd/type@0/slot@7/elmdesc@Slot06/p2'
whole_disk: 1
DTL: 523
create_txg: 4
com.delphix:vdev_zap_leaf: 43
children[7]:
type: 'disk'
id: 7
guid: 319630162253237478
path: '/dev/gptid/73fe9b9e-07bd-11ea-86a4-ac1f6baba4d8'
phys_path: 'id1,enc@n5003048017e619fd/type@0/slot@8/elmdesc@Slot07/p2'
whole_disk: 1
DTL: 522
create_txg: 4
com.delphix:vdev_zap_leaf: 44
children[8]:
type: 'disk'
id: 8
guid: 13242069713259626070
path: '/dev/gptid/753028fb-07bd-11ea-86a4-ac1f6baba4d8'
phys_path: 'id1,enc@n5003048017e619fd/type@0/slot@9/elmdesc@Slot08/p2'
whole_disk: 1
DTL: 521
create_txg: 4
com.delphix:vdev_zap_leaf: 45
children[9]:
type: 'disk'
id: 9
guid: 5550657330613046957
path: '/dev/gptid/78a5f017-07bd-11ea-86a4-ac1f6baba4d8'
phys_path: 'id1,enc@n5003048017e619fd/type@0/slot@a/elmdesc@Slot09/p2'
whole_disk: 1
DTL: 520
create_txg: 4
com.delphix:vdev_zap_leaf: 46
children[10]:
type: 'disk'
id: 10
guid: 3759376905440185921
path: '/dev/gptid/7aef8056-07bd-11ea-86a4-ac1f6baba4d8'
phys_path: 'id1,enc@n5003048017e619fd/type@0/slot@b/elmdesc@Slot10/p2'
whole_disk: 1
DTL: 519
create_txg: 4
com.delphix:vdev_zap_leaf: 47
children[1]:
type: 'raidz'
id: 1
guid: 7154914320912802713
nparity: 1
metaslab_array: 82
metaslab_shift: 40
ashift: 12
asize: 131977847767040
is_log: 0
create_txg: 4
com.delphix:vdev_zap_top: 48
children[0]:
type: 'disk'
id: 0
guid: 10997952441857646740
path: '/dev/gptid/7c17efb7-07bd-11ea-86a4-ac1f6baba4d8'
phys_path: 'id1,enc@n5003048017e619fd/type@0/slot@c/elmdesc@Slot11/p2'
whole_disk: 1
DTL: 518
create_txg: 4
com.delphix:vdev_zap_leaf: 49
children[1]:
type: 'disk'
id: 1
guid: 4190831562550916777
path: '/dev/gptid/7d3e3494-07bd-11ea-86a4-ac1f6baba4d8'
phys_path: 'id1,enc@n5003048017e619fd/type@0/slot@d/elmdesc@Slot12/p2'
whole_disk: 1
DTL: 517
create_txg: 4
com.delphix:vdev_zap_leaf: 50
children[2]:
type: 'disk'
id: 2
guid: 11435296573455087415
path: '/dev/gptid/7e60c954-07bd-11ea-86a4-ac1f6baba4d8'
phys_path: 'id1,enc@n5003048017e619fd/type@0/slot@e/elmdesc@Slot13/p2'
whole_disk: 1
DTL: 516
create_txg: 4
com.delphix:vdev_zap_leaf: 51
children[3]:
type: 'disk'
id: 3
guid: 8758163334622723605
path: '/dev/gptid/7f826c51-07bd-11ea-86a4-ac1f6baba4d8'
phys_path: 'id1,enc@n5003048017e619fd/type@0/slot@f/elmdesc@Slot14/p2'
whole_disk: 1
DTL: 515
create_txg: 4
com.delphix:vdev_zap_leaf: 52
children[4]:
type: 'disk'
id: 4
guid: 1554108281224175595
path: '/dev/gptid/80cfeb7d-07bd-11ea-86a4-ac1f6baba4d8'
phys_path: 'id1,enc@n5003048017e619fd/type@0/slot@10/elmdesc@Slot15/p2'
whole_disk: 1
DTL: 514
create_txg: 4
com.delphix:vdev_zap_leaf: 53
children[5]:
type: 'disk'
id: 5
guid: 9457560316490865575
path: '/dev/gptid/81f92de2-07bd-11ea-86a4-ac1f6baba4d8'
phys_path: 'id1,enc@n5003048017e619fd/type@0/slot@11/elmdesc@Slot16/p2'
whole_disk: 1
DTL: 513
create_txg: 4
com.delphix:vdev_zap_leaf: 54
children[6]:
type: 'disk'
id: 6
guid: 4146129044088663225
path: '/dev/gptid/83249a69-07bd-11ea-86a4-ac1f6baba4d8'
phys_path: 'id1,enc@n5003048017e619fd/type@0/slot@12/elmdesc@Slot17/p2'
whole_disk: 1
DTL: 512
create_txg: 4
com.delphix:vdev_zap_leaf: 55
children[7]:
type: 'disk'
id: 7
guid: 7678557413777414035
path: '/dev/gptid/85762b68-07bd-11ea-86a4-ac1f6baba4d8'
phys_path: 'id1,enc@n5003048017e619fd/type@0/slot@13/elmdesc@Slot18/p2'
whole_disk: 1
DTL: 511
create_txg: 4
com.delphix:vdev_zap_leaf: 56
children[8]:
type: 'disk'
id: 8
guid: 7627590863544051757
path: '/dev/gptid/86a37df0-07bd-11ea-86a4-ac1f6baba4d8'
phys_path: 'id1,enc@n5003048017e619fd/type@0/slot@14/elmdesc@Slot19/p2'
whole_disk: 1
DTL: 510
create_txg: 4
com.delphix:vdev_zap_leaf: 57
children[9]:
type: 'disk'
id: 9
guid: 7617092473639876886
path: '/dev/gptid/87e25dce-07bd-11ea-86a4-ac1f6baba4d8'
phys_path: 'id1,enc@n5003048017e619fd/type@0/slot@15/elmdesc@Slot20/p2'
whole_disk: 1
DTL: 509
create_txg: 4
com.delphix:vdev_zap_leaf: 58
children[10]:
type: 'disk'
id: 10
guid: 4750580605360295653
path: '/dev/gptid/890a9fc6-07bd-11ea-86a4-ac1f6baba4d8'
phys_path: 'id1,enc@n5003048017e619fd/type@0/slot@16/elmdesc@Slot21/p2'
whole_disk: 1
DTL: 508
create_txg: 4
com.delphix:vdev_zap_leaf: 59
children[2]:
type: 'raidz'
id: 2
guid: 1522770350701994749
nparity: 1
metaslab_array: 77
metaslab_shift: 40
ashift: 12
asize: 131977847767040
is_log: 0
create_txg: 4
com.delphix:vdev_zap_top: 60
children[0]:
type: 'disk'
id: 0
guid: 8274469998241797920
path: '/dev/gptid/8b8bd8c7-07bd-11ea-86a4-ac1f6baba4d8'
phys_path: 'id1,enc@n5003048017e619fd/type@0/slot@17/elmdesc@Slot22/p2'
whole_disk: 1
DTL: 540
create_txg: 4
com.delphix:vdev_zap_leaf: 61
children[1]:
type: 'disk'
id: 1
guid: 17305851721950045858
path: '/dev/gptid/8cc5a48c-07bd-11ea-86a4-ac1f6baba4d8'
phys_path: 'id1,enc@n5003048017e619fd/type@0/slot@18/elmdesc@Slot23/p2'
whole_disk: 1
DTL: 539
create_txg: 4
com.delphix:vdev_zap_leaf: 62
children[2]:
type: 'disk'
id: 2
guid: 5208041689598955156
path: '/dev/gptid/8df54f9c-07bd-11ea-86a4-ac1f6baba4d8'
phys_path: 'id1,enc@n500304801f3f8efd/type@0/slot@1/elmdesc@Slot00/p2'
whole_disk: 1
DTL: 538
create_txg: 4
com.delphix:vdev_zap_leaf: 63
children[3]:
type: 'disk'
id: 3
guid: 16873492542238860041
path: '/dev/gptid/8f3b7ab8-07bd-11ea-86a4-ac1f6baba4d8'
phys_path: 'id1,enc@n500304801f3f8efd/type@0/slot@2/elmdesc@Slot01/p2'
whole_disk: 1
DTL: 537
create_txg: 4
com.delphix:vdev_zap_leaf: 64
children[4]:
type: 'disk'
id: 4
guid: 15193116924882371258
path: '/dev/gptid/90775826-07bd-11ea-86a4-ac1f6baba4d8'
phys_path: 'id1,enc@n500304801f3f8efd/type@0/slot@3/elmdesc@Slot02/p2'
whole_disk: 1
DTL: 536
create_txg: 4
com.delphix:vdev_zap_leaf: 65
children[5]:
type: 'disk'
id: 5
guid: 6591348339669236115
path: '/dev/gptid/91add397-07bd-11ea-86a4-ac1f6baba4d8'
phys_path: 'id1,enc@n500304801f3f8efd/type@0/slot@4/elmdesc@Slot03/p2'
whole_disk: 1
DTL: 535
create_txg: 4
com.delphix:vdev_zap_leaf: 66
children[6]:
type: 'disk'
id: 6
guid: 5966294035454654495
path: '/dev/gptid/92ea1c7f-07bd-11ea-86a4-ac1f6baba4d8'
phys_path: 'id1,enc@n500304801f3f8efd/type@0/slot@5/elmdesc@Slot04/p2'
whole_disk: 1
DTL: 534
create_txg: 4
com.delphix:vdev_zap_leaf: 67
children[7]:
type: 'disk'
id: 7
guid: 18163592454427313981
path: '/dev/gptid/942f253c-07bd-11ea-86a4-ac1f6baba4d8'
phys_path: 'id1,enc@n500304801f3f8efd/type@0/slot@6/elmdesc@Slot05/p2'
whole_disk: 1
DTL: 533
create_txg: 4
com.delphix:vdev_zap_leaf: 68
children[8]:
type: 'disk'
id: 8
guid: 14916012336257181087
path: '/dev/gptid/9574cd42-07bd-11ea-86a4-ac1f6baba4d8'
phys_path: 'id1,enc@n500304801f3f8efd/type@0/slot@7/elmdesc@Slot06/p2'
whole_disk: 1
DTL: 532
create_txg: 4
com.delphix:vdev_zap_leaf: 69
children[9]:
type: 'replacing'
id: 9
guid: 6890398197517596066
whole_disk: 0
create_txg: 4
children[0]:
type: 'disk'
id: 0
guid: 12895587042125083494
path: '/dev/gptid/97de13dd-07bd-11ea-86a4-ac1f6baba4d8'
phys_path: 'id1,enc@n500304801f3f8efd/type@0/slot@8/elmdesc@Slot07/p2'
whole_disk: 1
DTL: 531
create_txg: 4
com.delphix:vdev_zap_leaf: 70
offline: 1
children[1]:
type: 'disk'
id: 1
guid: 3717726716355821089
path: '/dev/gptid/654a8fcb-c0af-11ea-9cbe-ac1f6baba4d8'
phys_path: 'id1,enc@n500304801f3f8efd/type@0/slot@c/elmdesc@Slot11/p2'
whole_disk: 1
DTL: 585
create_txg: 4
com.delphix:vdev_zap_leaf: 584
resilver_txg: 7021171
children[10]:
type: 'disk'
id: 10
guid: 2481128861327526451
path: '/dev/gptid/9928717c-07bd-11ea-86a4-ac1f6baba4d8'
phys_path: 'id1,enc@n500304801f3f8efd/type@0/slot@9/elmdesc@Slot08/p2'
whole_disk: 1
DTL: 530
create_txg: 4
com.delphix:vdev_zap_leaf: 71
children[3]:
type: 'disk'
id: 3
guid: 15313672968754466443
path: '/dev/gptid/9ad2749f-07bd-11ea-86a4-ac1f6baba4d8'
whole_disk: 1
metaslab_array: 76
metaslab_shift: 31
ashift: 12
asize: 280060231680
is_log: 1
DTL: 541
create_txg: 4
com.delphix:vdev_zap_leaf: 72
com.delphix:vdev_zap_top: 73
features_for_read:
com.delphix:hole_birth
com.delphix:embedded_data
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
Code:
children[9]:
    type: 'replacing'
    id: 9
    guid: 6890398197517596066
    whole_disk: 0
    create_txg: 4
    children[0]:
        type: 'disk'
        id: 0
        guid: 12895587042125083494
        path: '/dev/gptid/97de13dd-07bd-11ea-86a4-ac1f6baba4d8'
        phys_path: 'id1,enc@n500304801f3f8efd/type@0/slot@8/elmdesc@Slot07/p2'
        whole_disk: 1
        DTL: 531
        create_txg: 4
        com.delphix:vdev_zap_leaf: 70
        offline: 1
    children[1]:
        type: 'disk'
        id: 1
        guid: 3717726716355821089
        path: '/dev/gptid/654a8fcb-c0af-11ea-9cbe-ac1f6baba4d8'
        phys_path: 'id1,enc@n500304801f3f8efd/type@0/slot@c/elmdesc@Slot11/p2'
        whole_disk: 1
        DTL: 585
        create_txg: 4
        com.delphix:vdev_zap_leaf: 584
        resilver_txg: 7021171


So this shows the resilver in progress, with the disk in slot 11 in the back replacing the disk in slot 7. Did you kick this off manually? The ashift value, which reflects the native sector size, is consistent across the pool at ashift=12. (If a spare has a different ashift value, that can prevent it from becoming active.)
 

Quietus

Cadet
Joined
Jul 7, 2020
Messages
8
Interesting and good to know. I'm pretty diligent about using the same drive topology in each box. Yes, I did kick it off manually. It appears that I can add or remove spares to the pool at will too. I was just hoping that they would act like hot spares.
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
The only factor I can think of that could interfere with the spares working is the HBA. How was this flashed? Does it have a boot ROM?
 

Quietus

Cadet
Joined
Jul 7, 2020
Messages
8
The Avago Technologies (LSI) SAS3008 is just an HBA with no RAID capabilities. I flashed it with the most recent firmware as of last November and it relays SMART info without an issue. It does have a boot ROM and I believe it's set to OS only.
 

Quietus

Cadet
Joined
Jul 7, 2020
Messages
8
Let's back up a bit and maybe you can answer a few questions for me. First off, are the spares 'hot spares' and should they kick in automatically? If so, then would an error like this, kick off the process of replacing and resilvering?

Jul 5 07:22:07 silicon smartd[96188]: Device: /dev/da31, SMART Failure: FAILURE PREDICTION THRESHOLD EXCEEDED: ascq=0x5
The same message has been repeating since I took the drive offline:

Jul 7 20:11:59 silicon smartd[33277]: Device: /dev/da31, SMART Failure: FAILURE PREDICTION THRESHOLD EXCEEDED: ascq=0x5

Normally, when a drive fails I just replace it. But, since this issue was triggering the system to reboot, I really want to understand where the failure was. Thanks again for you assistance!
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
As for causing reboots, the only thing I can think of is the HBA is failing, and could be randomly sending excess current or voltage onto the PCI-E slot based on signals from the port of the failing drive.
 

Quietus

Cadet
Joined
Jul 7, 2020
Messages
8
Based on the first error I captured I was going to replace the RAM and then the next error had me going after the HBA. The last error I got was a panic with the pool being hung on a vdev, so that made me start thinking that the drive was the issue. After taking the drive offline, I was able to sustain transfer speeds of around 1GBps over a singe 10Gbps NIC without the system resetting. After testing this several times with success, I'm pretty confident that the drive was the culprit. I've never resilvered with 12TB drives, but now am at 30% just over 4 hours in...can't wait to start my next build with 16TB drives :)
 
Top