My First Attempt at Building a NAS

danb35 · Jan 20, 2023

ImNotNASty said:
Badblocks. It appears to keep on running until terminated.

No, it runs through four passes--IIRC, it writes 0xaa, then 0x55, then 0xff, then 0x00. Then it terminates. If you want to stop it early, Ctrl-C will do it.

ImNotNASty · Jan 20, 2023

danb35 said:
No, it runs through four passes--IIRC, it writes 0xaa, then 0x55, then 0xff, then 0x00. Then it terminates. If you want to stop it early, Ctrl-C will do it.

Thanks - I'll keep watching it, but I'd swear it's completed more than one iteration. I know that there is a switch you can set to specify the number of iterations, but I only used: badblocks -b 4098 -ws /dev/adaX

joeschmuck · Jan 20, 2023

Badblocks runs 5 4 different test patterns I ~~thought,~~ ~~could be~~ was wrong, it will terminate after it's complete. Yes, it takes along time with large hard drives. Imagine 18TB drives having to be tested, Yikes!

Yup, I thought there was a #A5 pattern. I was wrong.

ImNotNASty · Jan 20, 2023

It's showing 0xaa and 0x55 as complete for all drives. It just completed 0xff and rolled back into 0xff again, so it's running multiple, sequential iterations of the same test before rolling over to the next variant. How many iterations, I'm not sure. So it's running the 3rd variant of four (0xaa, 0x55, 0xff, 0x00). Once that's done, we'll see if it rolls into the 0xff again or into the last variant 0x00. Yeah it's taking a long time. The good news is no errors - so far. Fingers crossed.

Is there anything more to analyzing badblocks data than reading the error count?

Note to self - be glad that I'll likely never need a bank of 18TB disks.

joeschmuck · Jan 20, 2023

ImNotNASty said:
Is there anything more to analyzing badblocks data than reading the error count?

Nope, not really. If it reports an error then you have a problem spot on the drive.

ImNotNASty said:
It just completed 0xff and rolled back into 0xff again, so it's running multiple, sequential iterations of the same test before rolling over to the next variant.

I don't recall it doing that when I ran it. And you didn't specify -p val to change the number of passes from 1 to val. I'm not sure if it would retest if it found a questionable area.

ImNotNASty said:
badblocks -b 4098 -ws /dev/adaX

Why did you use a block size of 4098? It normally would be a multiple of 512, for example 4096 which would be good on an Advance Format drive (4K blocks). Maybe you typed it wrong and you did enter 4096. On the next drive you could possibly make it a little faster by adding -c 128 to the line to test more blocks at the same time.

Good luck, hope it all goes without a hitch.

joeschmuck · Jan 20, 2023

ImNotNASty said:
Note to self - be glad that I'll likely never need a bank of 18TB disks.

I know I won't. Four 6TB drives is way more than I need, but I like having the free space, never know when I need to backup a few extra computers.

ImNotNASty · Jan 20, 2023

joeschmuck said:
Nope, not really. If it reports an error then you have a problem spot on the drive.

I don't recall it doing that when I ran it. And you didn't specify -p val to change the number of passes from 1 to val. I'm not sure if it would retest if it found a questionable area.

Why did you use a block size of 4098? It normally would be a multiple of 512, for example 4096 which would be good on an Advance Format drive (4K blocks). Maybe you typed it wrong and you did enter 4096. On the next drive you could possibly make it a little faster by adding -c 128 to the line to test more blocks at the same time.

Good luck, hope it all goes without a hitch.

Yep, a typo. It should have been 4096.

ImNotNASty · Jan 22, 2023

Badblocks took ~70 hours to complete for 6TB disks. No errors reported.

Etorix · Jan 22, 2023

joeschmuck said:
I don't recall it doing that when I ran it.

It doesn't, but I think that @ImNotNASty takes the writing part and the reading part as "two iterations of the same test".

ImNotNASty · Jan 22, 2023

I didn't realize that it wrote in one pass and then read in a second pass. It's not verbose and it's a new exercise to me.

ImNotNASty · Jan 27, 2023

Badblocks took about 70 hours to complete. All 5 disks reported error counts of (0/0/0). Next I ran smartctl -t long on all 5 disks. After that, I ran smartctl -a and screenshotted the output for all 5 drives (attached). I'm seeing errors on ada0 & ada4 - the other 3 look ok to me, but I'm out of my depth here and don't really understand what I'm looking at.

I ran Memtest86 for 24 hours. It did multiple passes of my 32GB of memory. CPU temp got up to 45 C. Memtest reported no errors.

CPUStresstest is running now. It's been running for half a day. 1.5 e+15 FP ops.

joeschmuck · Jan 28, 2023

ada0: UDMA_CRC_Errors (ID 199) is 3362. Monitor this, if it continues to increase at all then you likely have a data cable issue.
ada1: No issues.
ada2: Your Power On Hours is 251 and your Load Cycle Count is 120, and Power Cycle Count is 28. It looks like you are sleeping this drive. Be aware that head loading and unloading and spinning up a drive frequently can cause premature failure. It's not an issue if you are okay with it.
ada3: No issues.
ada4: Same issue as ada0.

UDMA_CRC_Errors will never reset to zero, they are stored forever. All you can do is ensure they do not increase. This is typically the cause of a data communication corruption. Check you data cables first if the value continues to increment. If it does not increment then leave it alone because it's working fine.

There is a link in my signature to a Hard Drive Troubleshooting Guide. It has some good information in it to explain what values are important to look at.

ImNotNASty · Jan 30, 2023

joeschmuck said:
ada0: UDMA_CRC_Errors (ID 199) is 3362. Monitor this, if it continues to increase at all then you likely have a data cable issue.
ada1: No issues.
ada2: Your Power On Hours is 251 and your Load Cycle Count is 120, and Power Cycle Count is 28. It looks like you are sleeping this drive. Be aware that head loading and unloading and spinning up a drive frequently can cause premature failure. It's not an issue if you are okay with it.
ada3: No issues.
ada4: Same issue as ada0.

UDMA_CRC_Errors will never reset to zero, they are stored forever. All you can do is ensure they do not increase. This is typically the cause of a data communication corruption. Check you data cables first if the value continues to increment. If it does not increment then leave it alone because it's working fine.

There is a link in my signature to a Hard Drive Troubleshooting Guide. It has some good information in it to explain what values are important to look at.

Thank you for looking at this and for your input. I read your HDD TS guide and made a copy for reference. The same with the HDD Burn-In Guide. These drives are "new" to me, so I can't speak to their history. You can see from the hours count that they've logged significant hours.

The SATA cables I'm using are the ones that came with the MoBo. I have ordered new ones as the ones I'm using are too long and some have 90o connectors. Straight connectors will be better for the geometry of this build. Once I change out those cables, it will be time to configure the NAS. More reading, but I'm enjoying learning something about how all of this works.

joeschmuck · Jan 31, 2023

ImNotNASty said:
I have ordered new ones as the ones I'm using are too long and some have 90o connectors.

Just remember, monitor the UDMA_CRC_Errors value, while it will never return to zero, it's important to monitor it to see if it's increasing. If it's not increasing then all is good.

Good Luck!

ImNotNASty · Jan 31, 2023

Can I monitor by rerunning smartctl -a, or do I need to run smartctl -t long first?

Etorix · Jan 31, 2023

-a is enough to read the parameters, but it is good practice to have scheduled long SMART tests on a weekly to monthly basis anyway.

joeschmuck · Feb 1, 2023

I agree and my personal preference is to run a daily short test and a weekly long test. You need to understand that SMART at best was designed to give a person a 24 hour notification of failure and it's not perfect. But to monitor you would use the -a to read the UDMA_CRC_Errors RAW value.

I'm not trying to pettle this on you but there is a link in my signature for Multi-Report Script. Give it a look. It will generate a report and send it to your email. It makes tracking this kind of thing easier.

ImNotNASty · Feb 1, 2023

joeschmuck said:
Just remember, monitor the UDMA_CRC_Errors value, while it will never return to zero, it's important to monitor it to see if it's increasing. If it's not increasing then all is good.

Good Luck!

Thanks again. I do have SMART tests set up to run on a schedule. The new, shorter SATA III cables arrived today and have been installed. The old ones were long enough and crammed in enough that they may have been creating a little torque on the connectors. We'll see if I continue to accrue communications errors.

I've set up my pool "TANK" - 5 x 6TB drives in RAID-Z2, so ~16TB of storage. My datasets are set up along with Samba shares and ACL. Plex is installed and the libraries are mounted. So far, it's going good. I still have a lot of content to load in.

Important Announcement for the TrueNAS Community.

My First Attempt at Building a NAS

danb35

Hall of Famer

ImNotNASty

Dabbler

joeschmuck

Old Man

ImNotNASty

Dabbler

joeschmuck

Old Man

joeschmuck

Old Man

ImNotNASty

Dabbler

ImNotNASty

Dabbler

Etorix

Wizard

ImNotNASty

Dabbler

ImNotNASty

Dabbler

Attachments

joeschmuck

Old Man

ImNotNASty

Dabbler

joeschmuck

Old Man

ImNotNASty

Dabbler

Etorix

Wizard

joeschmuck

Old Man

ImNotNASty

Dabbler

Similar threads

Important Announcement for the TrueNAS Community.

My First Attempt at Building a NAS

Hall of Famer

Dabbler

Old Man

Dabbler

Old Man

Old Man

Dabbler

Dabbler

Wizard

Dabbler

Dabbler

Attachments

Old Man

Dabbler

Old Man

Dabbler

Wizard

Old Man

Dabbler

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "My First Attempt at Building a NAS"

Similar threads