My First Attempt at Building a NAS

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Badblocks. It appears to keep on running until terminated.
No, it runs through four passes--IIRC, it writes 0xaa, then 0x55, then 0xff, then 0x00. Then it terminates. If you want to stop it early, Ctrl-C will do it.
 

ImNotNASty

Dabbler
Joined
Dec 21, 2022
Messages
25
No, it runs through four passes--IIRC, it writes 0xaa, then 0x55, then 0xff, then 0x00. Then it terminates. If you want to stop it early, Ctrl-C will do it.
Thanks - I'll keep watching it, but I'd swear it's completed more than one iteration. I know that there is a switch you can set to specify the number of iterations, but I only used: badblocks -b 4098 -ws /dev/adaX
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Badblocks runs 5 4 different test patterns I thought, could be was wrong, it will terminate after it's complete. Yes, it takes along time with large hard drives. Imagine 18TB drives having to be tested, Yikes!

Yup, I thought there was a #A5 pattern. I was wrong.
 

ImNotNASty

Dabbler
Joined
Dec 21, 2022
Messages
25
It's showing 0xaa and 0x55 as complete for all drives. It just completed 0xff and rolled back into 0xff again, so it's running multiple, sequential iterations of the same test before rolling over to the next variant. How many iterations, I'm not sure. So it's running the 3rd variant of four (0xaa, 0x55, 0xff, 0x00). Once that's done, we'll see if it rolls into the 0xff again or into the last variant 0x00. Yeah it's taking a long time. The good news is no errors - so far. Fingers crossed.

Is there anything more to analyzing badblocks data than reading the error count?

Note to self - be glad that I'll likely never need a bank of 18TB disks.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Is there anything more to analyzing badblocks data than reading the error count?
Nope, not really. If it reports an error then you have a problem spot on the drive.

It just completed 0xff and rolled back into 0xff again, so it's running multiple, sequential iterations of the same test before rolling over to the next variant.
I don't recall it doing that when I ran it. And you didn't specify -p val to change the number of passes from 1 to val. I'm not sure if it would retest if it found a questionable area.

badblocks -b 4098 -ws /dev/adaX
Why did you use a block size of 4098? It normally would be a multiple of 512, for example 4096 which would be good on an Advance Format drive (4K blocks). Maybe you typed it wrong and you did enter 4096. On the next drive you could possibly make it a little faster by adding -c 128 to the line to test more blocks at the same time.

Good luck, hope it all goes without a hitch.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Note to self - be glad that I'll likely never need a bank of 18TB disks.
I know I won't. Four 6TB drives is way more than I need, but I like having the free space, never know when I need to backup a few extra computers.
 

ImNotNASty

Dabbler
Joined
Dec 21, 2022
Messages
25
Nope, not really. If it reports an error then you have a problem spot on the drive.


I don't recall it doing that when I ran it. And you didn't specify -p val to change the number of passes from 1 to val. I'm not sure if it would retest if it found a questionable area.


Why did you use a block size of 4098? It normally would be a multiple of 512, for example 4096 which would be good on an Advance Format drive (4K blocks). Maybe you typed it wrong and you did enter 4096. On the next drive you could possibly make it a little faster by adding -c 128 to the line to test more blocks at the same time.

Good luck, hope it all goes without a hitch.
Yep, a typo. It should have been 4096.
 

Etorix

Wizard
Joined
Dec 30, 2020
Messages
2,134

ImNotNASty

Dabbler
Joined
Dec 21, 2022
Messages
25
I didn't realize that it wrote in one pass and then read in a second pass. It's not verbose and it's a new exercise to me.
 

ImNotNASty

Dabbler
Joined
Dec 21, 2022
Messages
25
Badblocks took about 70 hours to complete. All 5 disks reported error counts of (0/0/0). Next I ran smartctl -t long on all 5 disks. After that, I ran smartctl -a and screenshotted the output for all 5 drives (attached). I'm seeing errors on ada0 & ada4 - the other 3 look ok to me, but I'm out of my depth here and don't really understand what I'm looking at.

I ran Memtest86 for 24 hours. It did multiple passes of my 32GB of memory. CPU temp got up to 45 C. Memtest reported no errors.

CPUStresstest is running now. It's been running for half a day. 1.5 e+15 FP ops.
 

Attachments

  • Hard Drive Test Data.pdf
    3.5 MB · Views: 156

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
ada0: UDMA_CRC_Errors (ID 199) is 3362. Monitor this, if it continues to increase at all then you likely have a data cable issue.
ada1: No issues.
ada2: Your Power On Hours is 251 and your Load Cycle Count is 120, and Power Cycle Count is 28. It looks like you are sleeping this drive. Be aware that head loading and unloading and spinning up a drive frequently can cause premature failure. It's not an issue if you are okay with it.
ada3: No issues.
ada4: Same issue as ada0.

UDMA_CRC_Errors will never reset to zero, they are stored forever. All you can do is ensure they do not increase. This is typically the cause of a data communication corruption. Check you data cables first if the value continues to increment. If it does not increment then leave it alone because it's working fine.

There is a link in my signature to a Hard Drive Troubleshooting Guide. It has some good information in it to explain what values are important to look at.
 

ImNotNASty

Dabbler
Joined
Dec 21, 2022
Messages
25
ada0: UDMA_CRC_Errors (ID 199) is 3362. Monitor this, if it continues to increase at all then you likely have a data cable issue.
ada1: No issues.
ada2: Your Power On Hours is 251 and your Load Cycle Count is 120, and Power Cycle Count is 28. It looks like you are sleeping this drive. Be aware that head loading and unloading and spinning up a drive frequently can cause premature failure. It's not an issue if you are okay with it.
ada3: No issues.
ada4: Same issue as ada0.

UDMA_CRC_Errors will never reset to zero, they are stored forever. All you can do is ensure they do not increase. This is typically the cause of a data communication corruption. Check you data cables first if the value continues to increment. If it does not increment then leave it alone because it's working fine.

There is a link in my signature to a Hard Drive Troubleshooting Guide. It has some good information in it to explain what values are important to look at.
Thank you for looking at this and for your input. I read your HDD TS guide and made a copy for reference. The same with the HDD Burn-In Guide. These drives are "new" to me, so I can't speak to their history. You can see from the hours count that they've logged significant hours.

The SATA cables I'm using are the ones that came with the MoBo. I have ordered new ones as the ones I'm using are too long and some have 90o connectors. Straight connectors will be better for the geometry of this build. Once I change out those cables, it will be time to configure the NAS. More reading, but I'm enjoying learning something about how all of this works.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
I have ordered new ones as the ones I'm using are too long and some have 90o connectors.
Just remember, monitor the UDMA_CRC_Errors value, while it will never return to zero, it's important to monitor it to see if it's increasing. If it's not increasing then all is good.

Good Luck!
 

ImNotNASty

Dabbler
Joined
Dec 21, 2022
Messages
25
Can I monitor by rerunning smartctl -a, or do I need to run smartctl -t long first?
 

Etorix

Wizard
Joined
Dec 30, 2020
Messages
2,134
-a is enough to read the parameters, but it is good practice to have scheduled long SMART tests on a weekly to monthly basis anyway.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
I agree and my personal preference is to run a daily short test and a weekly long test. You need to understand that SMART at best was designed to give a person a 24 hour notification of failure and it's not perfect. But to monitor you would use the -a to read the UDMA_CRC_Errors RAW value.

I'm not trying to pettle this on you but there is a link in my signature for Multi-Report Script. Give it a look. It will generate a report and send it to your email. It makes tracking this kind of thing easier.
 
Last edited:

ImNotNASty

Dabbler
Joined
Dec 21, 2022
Messages
25
Just remember, monitor the UDMA_CRC_Errors value, while it will never return to zero, it's important to monitor it to see if it's increasing. If it's not increasing then all is good.

Good Luck!
Thanks again. I do have SMART tests set up to run on a schedule. The new, shorter SATA III cables arrived today and have been installed. The old ones were long enough and crammed in enough that they may have been creating a little torque on the connectors. We'll see if I continue to accrue communications errors.

I've set up my pool "TANK" - 5 x 6TB drives in RAID-Z2, so ~16TB of storage. My datasets are set up along with Samba shares and ACL. Plex is installed and the libraries are mounted. So far, it's going good. I still have a lot of content to load in.
 
Last edited:
Top