Disk Errors

NugentS · Apr 20, 2021

HI,
I am getting errors on nvme1. The problem is that I don't have an nvme1 according to FreeNAS
According to FreeNAS:
I have two SMC SataDOM on ada9 and ada8
I have two Intel Optanes on nvd0 & nvd1
and 2 Intel SSD's on ada7 & ada0
However /dev says:
ada9 & 8 are present
nvd0 & 1 are present
ada 0 & 7 present

Patrick M. Hausen · Apr 20, 2021

nvme1 and nvd1 are the same device. The former is the NVME namespace and the latter the block device used to actually store your data. SMART talks to the former.

NugentS · Apr 20, 2021

damnit thats one of the optanes

NugentS · Apr 20, 2021

When I do smartctl -a /dev/nvme1 I get the following:

Code:

Model Number:                       INTEL SSDPED1D280GA
Serial Number:                      XXXXXXXXXXXXXXXXXXX
Firmware Version:                   E2010325
PCI Vendor/Subsystem ID:            0x8086
IEEE OUI Identifier:                0x5cd2e4
Controller ID:                      0
NVMe Version:                       <1.2
Number of Namespaces:               1
Namespace 1 Size/Capacity:          280,065,171,456 [280 GB]
Namespace 1 Formatted LBA Size:     512
Local Time is:                      Wed Apr 21 00:30:43 2021 BST
Firmware Updates (0x02):            1 Slot
Optional Admin Commands (0x0007):   Security Format Frmw_DL
Optional NVM Commands (0x0006):     Wr_Unc DS_Mngmt
Log Page Attributes (0x02):         Cmd_Eff_Lg
Maximum Data Transfer Size:         32 Pages

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +    18.00W       -        -    0  0  0  0        0       0

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         2

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        31 Celsius
Available Spare:                    100%
Available Spare Threshold:          0%
Percentage Used:                    0%
Data Units Read:                    58,910 [30.1 GB]
Data Units Written:                 43,994,741 [22.5 TB]
Host Read Commands:                 3,080,964
Host Write Commands:                716,899,877
Controller Busy Time:               199
Power Cycles:                       65
Power On Hours:                     7,947
Unsafe Shutdowns:                   18
Media and Data Integrity Errors:    0
Error Information Log Entries:      21

Error Information (NVMe Log 0x01, 16 of 64 entries)
Num   ErrCount  SQId   CmdId  Status  PELoc          LBA  NSID    VS
  0         21     8       -  0xc00c      -            0     -     -
  1         20     2       -  0xc00c      -            0     -     -
  2         19     3       -  0xc00c      -            0     -     -
  3         18     7       -  0xc00c      -            0     -     -
  4         17     3       -  0xc00c      -            0     -     -
  5         16     7       -  0xc00c      -            0     -     -
  6         15     4       -  0xc00c      -            0     -     -
  7         14     9       -  0xc00c      -            0     -     -
  8         13     2       -  0xc00c      -            0     -     -
  9         12     1       -  0xc00c      -            0     -     -
 10         11     8       -  0xc00c      -            0     -     -
 11         10     2       -  0xc00c      -            0     -     -
 12          9    10       -  0xc00c      -            0     -     -
 13          8    10       -  0xc00c      -            0     -     -
 14          7     1       -  0xc00c      -            0     -     -
 15          6     3       -  0xc00c      -            0     -     -
... (48 entries not read)

Does anyone have an opinion. I am thinking (hoping) cable type issues. At 22TB written I have hardly touched this unit. The good news is that its running as a SLOG and a L2ARC, neither of which are entirely critical and I suspect the L2ARC is superflous @ 1.31G used

Kailee71 · Jul 16, 2021

Hi @NugentS,

sorry to bring up an old thread, I have just installed two 32Gb Optanes as (mirrored) SLOG, and one of them is giving the exact same errors;

Code:

truenas# smartctl -l error /dev/nvme0
smartctl 7.2 2020-12-30 r5155 [FreeBSD 12.2-RELEASE-p6 amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF SMART DATA SECTION ===
Error Information (NVMe Log 0x01, 16 of 64 entries)
Num   ErrCount  SQId   CmdId  Status  PELoc          LBA  NSID    VS
  0          9     1       -  0xc00c      -            0     -     -
  1          8     1       -  0xc00c      -            0     -     -
  2          7     1       -  0xc00c      -            0     -     -
  3          6     1       -  0xc00c      -            0     -     -
  4          5     1       -  0xc00c      -            0     -     -
  5          4     1       -  0xc00c      -            0     -     -
  6          3     1       -  0xc00c      -            0     -     -
  7          2     1       -  0xc00c      -            0     -     -
  8          1     1       -  0xc00c      -            0     -     -

Does anybody have any info on what the 0xc00c is?

BTW: This upgrade was easily the most effective thing for my usecase; I now have 300MiB/s sustained *sync* writes over SMB on a stripe of two mirrored WD40EFRX (8TiB useable) volume (and considerably more when striping the two Optanes). All for <<$100...

NugentS · Jul 16, 2021

I contacted Intel support who didn't consider it an issue - but did mention a firmware upgrade was available.
I never followed up any further

Optane Device with errors

I have a couple of optane devices in a TrueNAS server. One of them is generating errors. (see below) Does anyone have any idea what this means? From time to time I get a new warning about a new error. I am going to reseat the board and see if that's the issue. The device is acting as an L2ARC...

community.intel.com

NugentS · Jul 16, 2021

I also haven't had any errors since

Kailee71 · Jul 16, 2021

Oh ok - I'll for sure see if I can update firmware on these, and for now just monitor if more errors develop...

Thanks for the quick reply!

Important Announcement for the TrueNAS Community.

Disk Errors

NugentS

MVP

Patrick M. Hausen

Hall of Famer

NugentS

MVP

NugentS

MVP

Kailee71

Contributor

NugentS

MVP

Optane Device with errors

NugentS

MVP

Kailee71

Contributor

Similar threads

Important Announcement for the TrueNAS Community.

Disk Errors

MVP

Hall of Famer

MVP

MVP

Contributor

MVP

MVP

Contributor

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Disk Errors"

Similar threads