Can't Rebuild Pool Drives Won't Wipe

KazuyaDarklight · Jul 21, 2021

TLDR; We had a mishap, we are trying to rebuild a pool, but the drives won't wipe and integrate.

My system was acting up this morning; trouble connecting to share, the UI was sluggish and didn't want to show a lot of UI elements but there were no disk/pool based msgs in the Notification drop down. So I rebooted. I have 2 pools, one is SSD, the other is HDDs. HDD came up fine, but the SSD pool was completely failed with no other info, it just had a big red Disconnect button next to it. The disks were all listed in the Disk area, they didn't have any errors, but their pool column said NA. After some failed attempts to figure out what had happened I tried the disconnect option on the pool and then tried to import the disks, still no go. So I moved on to just rebuilding the pool and adding the disks with the intent to pull all our data back in from backups. But while the UI lets me add the disks when it tries to wipe and format them it errors out with the following details.

Code:

Error: Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/middlewared/job.py", line 367, in run
    await self.future
  File "/usr/local/lib/python3.9/site-packages/middlewared/job.py", line 403, in __run_body
    rv = await self.method(*([self] + args))
  File "/usr/local/lib/python3.9/site-packages/middlewared/schema.py", line 973, in nf
    return await f(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/pool.py", line 666, in do_create
    formatted_disks = await self.middleware.call('pool.format_disks', job, disks)
  File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1241, in call
    return await self._call(
  File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1198, in _call
    return await methodobj(*prepared_call.args)
  File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/pool_/format_disks.py", line 56, in format_disks
    await asyncio_map(format_disk, disks.items(), limit=16)
  File "/usr/local/lib/python3.9/site-packages/middlewared/utils/asyncio_.py", line 16, in asyncio_map
    return await asyncio.gather(*futures)
  File "/usr/local/lib/python3.9/site-packages/middlewared/utils/asyncio_.py", line 13, in func
    return await real_func(arg)
  File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/pool_/format_disks.py", line 29, in format_disk
    await self.middleware.call(
  File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1241, in call
    return await self._call(
  File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1209, in _call
    return await self.run_in_executor(prepared_call.executor, methodobj, *prepared_call.args)
  File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1113, in run_in_executor
    return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs))
  File "/usr/local/lib/python3.9/site-packages/middlewared/utils/io_thread_pool_executor.py", line 25, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/disk_/format.py", line 25, in format
    raise CallError(f'Failed to wipe disk {disk}: {job.error}')
middlewared.service_exception.CallError: [EFAULT] Failed to wipe disk da15: [EFAULT] Command gpart create -s gpt /dev/da15 failed (code 1):
gpart: Input/output error

Same if I try to just wipe them in the disk section.

While my immediate concern is getting the pool back up, hopefully in a trustworthy state, I would honestly love any help in trying to figure out what actually happened, though that may be a thread all its own. I'll also acknowledge it may be harder since we gave up and removed the pool, still, I'd be happy to try and pull logs etc. It would go a long way towards helping me feel like I can trust the system going forward.

Edited to fix some typos/readability.

Samuel Tai · Jul 21, 2021

How long were the SSDs in service, and what level of write volume did they experience? It could be you simply burned out the SSDs past their useful lifetime.

KazuyaDarklight · Jul 21, 2021

I suppose that could be possible, I'm not sure quite what our level of average write level was like. Probably moderately high, but not like a DB server. The pool was used for video editing. The pool existed since mid 2019 and was grown over time, some of the newer drives are only 1 year old, maybe a little less. I would have hoped for some level of warning, I was in the UI just last week and there were no errors other than the pools total capacity being low.

Samuel Tai · Jul 21, 2021

Were the SSDs consumer grade or enterprise grade? Consumer grade SSDs have much less write volume than enterprise drives. What model of SSDs did you use?

KazuyaDarklight · Jul 23, 2021

Enterprise, 16 Seagate Nytros, 45Drives currently thinks the controller gave out, they've sent a replacement and I'm installing it today. We'll see what happens.

Important Announcement for the TrueNAS Community.

Can't Rebuild Pool Drives Won't Wipe

KazuyaDarklight

Dabbler

Samuel Tai

Never underestimate your own stupidity

KazuyaDarklight

Dabbler

Samuel Tai

Never underestimate your own stupidity

KazuyaDarklight

Dabbler

Similar threads

Important Announcement for the TrueNAS Community.

Can't Rebuild Pool Drives Won't Wipe

KazuyaDarklight

Dabbler

Samuel Tai

Never underestimate your own stupidity

KazuyaDarklight

Dabbler

Samuel Tai

Never underestimate your own stupidity

KazuyaDarklight

Dabbler

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Can't Rebuild Pool Drives Won't Wipe"

Similar threads