Disk Errors

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
HI,
I am getting errors on nvme1. The problem is that I don't have an nvme1 according to FreeNAS
According to FreeNAS:
I have two SMC SataDOM on ada9 and ada8
I have two Intel Optanes on nvd0 & nvd1
and 2 Intel SSD's on ada7 & ada0
However /dev says:
ada9 & 8 are present
nvd0 & 1 are present
ada 0 & 7 present
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
nvme1 and nvd1 are the same device. The former is the NVME namespace and the latter the block device used to actually store your data. SMART talks to the former.
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
damnit thats one of the optanes
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
When I do smartctl -a /dev/nvme1 I get the following:
Code:
Model Number:                       INTEL SSDPED1D280GA
Serial Number:                      XXXXXXXXXXXXXXXXXXX
Firmware Version:                   E2010325
PCI Vendor/Subsystem ID:            0x8086
IEEE OUI Identifier:                0x5cd2e4
Controller ID:                      0
NVMe Version:                       <1.2
Number of Namespaces:               1
Namespace 1 Size/Capacity:          280,065,171,456 [280 GB]
Namespace 1 Formatted LBA Size:     512
Local Time is:                      Wed Apr 21 00:30:43 2021 BST
Firmware Updates (0x02):            1 Slot
Optional Admin Commands (0x0007):   Security Format Frmw_DL
Optional NVM Commands (0x0006):     Wr_Unc DS_Mngmt
Log Page Attributes (0x02):         Cmd_Eff_Lg
Maximum Data Transfer Size:         32 Pages

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +    18.00W       -        -    0  0  0  0        0       0

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         2

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        31 Celsius
Available Spare:                    100%
Available Spare Threshold:          0%
Percentage Used:                    0%
Data Units Read:                    58,910 [30.1 GB]
Data Units Written:                 43,994,741 [22.5 TB]
Host Read Commands:                 3,080,964
Host Write Commands:                716,899,877
Controller Busy Time:               199
Power Cycles:                       65
Power On Hours:                     7,947
Unsafe Shutdowns:                   18
Media and Data Integrity Errors:    0
Error Information Log Entries:      21

Error Information (NVMe Log 0x01, 16 of 64 entries)
Num   ErrCount  SQId   CmdId  Status  PELoc          LBA  NSID    VS
  0         21     8       -  0xc00c      -            0     -     -
  1         20     2       -  0xc00c      -            0     -     -
  2         19     3       -  0xc00c      -            0     -     -
  3         18     7       -  0xc00c      -            0     -     -
  4         17     3       -  0xc00c      -            0     -     -
  5         16     7       -  0xc00c      -            0     -     -
  6         15     4       -  0xc00c      -            0     -     -
  7         14     9       -  0xc00c      -            0     -     -
  8         13     2       -  0xc00c      -            0     -     -
  9         12     1       -  0xc00c      -            0     -     -
 10         11     8       -  0xc00c      -            0     -     -
 11         10     2       -  0xc00c      -            0     -     -
 12          9    10       -  0xc00c      -            0     -     -
 13          8    10       -  0xc00c      -            0     -     -
 14          7     1       -  0xc00c      -            0     -     -
 15          6     3       -  0xc00c      -            0     -     -
... (48 entries not read)


Does anyone have an opinion. I am thinking (hoping) cable type issues. At 22TB written I have hardly touched this unit. The good news is that its running as a SLOG and a L2ARC, neither of which are entirely critical and I suspect the L2ARC is superflous @ 1.31G used
 

Kailee71

Contributor
Joined
Jul 8, 2018
Messages
110
Hi @NugentS,

sorry to bring up an old thread, I have just installed two 32Gb Optanes as (mirrored) SLOG, and one of them is giving the exact same errors;
Code:
truenas# smartctl -l error /dev/nvme0
smartctl 7.2 2020-12-30 r5155 [FreeBSD 12.2-RELEASE-p6 amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF SMART DATA SECTION ===
Error Information (NVMe Log 0x01, 16 of 64 entries)
Num   ErrCount  SQId   CmdId  Status  PELoc          LBA  NSID    VS
  0          9     1       -  0xc00c      -            0     -     -
  1          8     1       -  0xc00c      -            0     -     -
  2          7     1       -  0xc00c      -            0     -     -
  3          6     1       -  0xc00c      -            0     -     -
  4          5     1       -  0xc00c      -            0     -     -
  5          4     1       -  0xc00c      -            0     -     -
  6          3     1       -  0xc00c      -            0     -     -
  7          2     1       -  0xc00c      -            0     -     -
  8          1     1       -  0xc00c      -            0     -     -


Does anybody have any info on what the 0xc00c is?

BTW: This upgrade was easily the most effective thing for my usecase; I now have 300MiB/s sustained *sync* writes over SMB on a stripe of two mirrored WD40EFRX (8TiB useable) volume (and considerably more when striping the two Optanes). All for <<$100...
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
I contacted Intel support who didn't consider it an issue - but did mention a firmware upgrade was available.
I never followed up any further
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
I also haven't had any errors since
 

Kailee71

Contributor
Joined
Jul 8, 2018
Messages
110
Oh ok - I'll for sure see if I can update firmware on these, and for now just monitor if more errors develop...

Thanks for the quick reply!
 
Top