Register for the iXsystems Community to get an ad-free experience and exclusive discounts in our eBay Store.

Bad batch of Toshiba N300 8Tb 256Mb cache, might be motherboard failure

Jasse Jansson

Member
Joined
Mar 19, 2017
Messages
71
This is about my recent upgrade from 4Tb WD Red's to Toshiba N300 8Tb 256Mb cache hard drives.

I have now a third disk within one month failing smart tests with this in my mailbox

Code:
FreeNAS @ amain.local

New alerts:
* Device: /dev/ada4, Failed SMART usage Attribute: 7 Seek_Error_Rate..

Current alerts:
* Device: /dev/ada4, FAILED SMART self-check. BACK UP DATA NOW!.
* Device: /dev/ada4, Failed SMART usage Attribute: 7 Seek_Error_Rate..


All of these are made in July 2020.
Can this be a bad batch from Toshiba or what ???

I have two that I replaced in February and they work fine. Those are the 128Mb cache model.
Did Toshiba make some other changes to those drives that messes up things ??
 

Jasse Jansson

Member
Joined
Mar 19, 2017
Messages
71
Missed some information in my first post.

The failing Toshiba's with 256Mb cache are: HDWG180UZSVA

The server is based on a Supermicro x11ssh-f motherboard, i3 cpu and 32Gb ecc ram.

The first two failed hd's was ada3, this third is ada4 and the sata cables are of the locking kind.
 

HoneyBadger

Mushroom! Mushroom!
Joined
Feb 6, 2014
Messages
3,391
What is the raw value for SEEK ERROR RATE?

Some vendors (Seagate) report a 48-bit RAW value, where the first 16-bits are "number of errors" and the last 32-bits are "number of seeks" so it's possible that the drive is fine.
 

Jasse Jansson

Member
Joined
Mar 19, 2017
Messages
71
I'm not at home until next week but I'll check it then.

But the first disk I returned had a permanent SMART failure.

Last week I seached my collection of stuff in the basement and could build a computer I installed FreeBSD on, just for testing this.
I did some amaturish tests with smartctl and an old 4Tb WD red disk had no errors at all when I ran the LONG test.
The second Toshiba gave up the LONG test with 90% to go and just sat like that for at least 12 hours.

I can't be the only one who runs these drives in a NAS.
 

HoneyBadger

Mushroom! Mushroom!
Joined
Feb 6, 2014
Messages
3,391
You could be the lucky one who's unearthing a firmware issue or something of the sort. Are you using an HBA or direct SATA?
 

Jasse Jansson

Member
Joined
Mar 19, 2017
Messages
71
I'm booting from sata m.2 and have a raidZ1 pool on the motherboard's 8 sata ports.
This setup have worked fine for 2 years with a mix of 3Tb and 4Tb WD red's.

BTW, define lucky...
 

mswarren

Newbie
Joined
Sep 30, 2020
Messages
3
I've just built my first FreeNas using 5x 8TB Toshiba N300 drives - FreeNAS-11.3-U4.1

All are reporting seek_error_rate failues, but the smart report looks ok to me:

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 100 100 050 Pre-fail Always - 0
2 Throughput_Performance 0x0005 100 100 050 Pre-fail Offline - 0
3 Spin_Up_Time 0x0027 100 100 001 Pre-fail Always - 7966
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 24
5 Reallocated_Sector_Ct 0x0033 100 100 050 Pre-fail Always - 0
7 Seek_Error_Rate 0x000b 001 001 050 Pre-fail Always FAILING_NOW 0

I don't see how worst is 001, and threshold is 050 this is a fail ?

Did you manage to solve this, or have you RMA’d the drives ?
 

Jasse Jansson

Member
Joined
Mar 19, 2017
Messages
71
I have been in my cabil in the countryside for 2 weeks now and might stay for another one, so no RMA yet.
I'll RMA both HDD's when I get back home.
I have one previously RMA'd and one in the server that mailed me this after this weeks SMART LONG test

Code:
New alerts:
* Device: /dev/ada4, Self-Test Log error count increased from 0 to 1.

Current alerts:
* Device: /dev/ada4, FAILED SMART self-check. BACK UP DATA NOW!.
* Device: /dev/ada4, Failed SMART usage Attribute: 7 Seek_Error_Rate..
* Device: /dev/ada4, Self-Test Log error count increased from 0 to 1.


The error count increase message can't be good.
If I get my money back (I'm sure I will) then I think I buy Seagate drives instead.
Still strange that Toshina's bought in february works mighty fine. Maybe a bad batch.
 

Jasse Jansson

Member
Joined
Mar 19, 2017
Messages
71
RMA'd the hard drive and bought a Seagate Ironwolf 8Tb instead of the Toshiba N300 drives.
Replaced ada3 and resilvered.
Now it don't want to recognize the new disk.
Well, it's a PC so I shut it down and restarted the server.
Now ada7 is totally gone but the pool works in a degraded state.
I only have 7 disks now.
I have all the data on my other 2 servers so everything is fine.

Can this be a motherboard failure ?
Better check to see if I still have valid warranty on it.
 

_gid

Newbie
Joined
Mar 30, 2021
Messages
1
I replaced four WD Red drives in my RAID-Z2 home NAS box (Core i7.. Gigabyte mainboard, IIRC) with Toshiba N300 8TB drives from Amazon UK this month.

Three were from November 2020 (serial /^Y040.....AUG$/) and one was from, I think, July 2020 (serial /^70K0.....AUG$/). The first of the November drives started failing with Seek_Error_Rate problems within a day. Amazon support laughingly told me to turn off SMART error checking to solve the problem. I just did a straight Replacement return instead.

The replacement was another November drive. It failed too, as have the other two November drives. The July drive is okay and unfortunately the return period for that drive has just expired... otherwise I'd replace it too just in case.

I didn't do any proper testing, but it was mighty weird the July drive's fine but all the November drives are making FreeNAS rather unhappy.

I'm in the slow process of replacing the drives one-by-one with WD Red Plus 8TB drives, and sending the Toshes back one-by-one for refund in the WD packaging! I'll keep an eye on it, and if that July drive starts flagging errors too, I'll cut my losses and replace that with a WD Red as well.
 

Jasse Jansson

Member
Joined
Mar 19, 2017
Messages
71
Well I gave in from the group pressure in this forum and bought a LSI card with the right firmware from ebay and have had no problems since then.
I suspect some strangeness from the way SATA ports are handled but I have no evidence to back it up.
One thing that supports my stangeness theory is that a Seagate drive also gave me the same SMART error.
I also replaced one of my FreeNAS servers with a Synology NAS (DS1821+) and it runs fine as my main server, with a FreeNAS server that handled weekly backups.
 

spitfire

Member
Joined
May 25, 2012
Messages
41
I have bought 4 of these back in February 2020. These are still working well.
I've ordered another one in March (received one manufactured in SEP-2020) and from the start it was emitting a clicking noise.
I rewired everything, changed data cable, even the power supply. It turned out the value of a S.M.A.R.T. attribute 220 Disk_Shift was insanely high (it was 0 on all of my previous four drives):
Code:
220 Disk_Shift              0x0002   100   100   000    Old_age   Always       -       101974023

After few days it developed S.M.A.R.T. error related to the 7 Seek_Error_Rate attribute, at which point I've replaced it.
I received another one, manufactured in January 2021 - same story (even higher 220 Disk_Shift, 7 Seek_Error_Rate followed soon after but also ZFS started detecting checksum errors. Then I've replaced it again - got another one made in February 2021 - same story (even higher 220 Disk_Shift than the previous ones, ZFS started detecting checksum errors, but no 7 Seek_Error_Rate yet).
 

Jasse Jansson

Member
Joined
Mar 19, 2017
Messages
71
Does your disk
I have bought 4 of these back in February 2020. These are still working well.
I've ordered another one in March (received one manufactured in SEP-2020) and from the start it was emitting a clicking noise.
I rewired everything, changed data cable, even the power supply. It turned out the value of a S.M.A.R.T. attribute 220 Disk_Shift was insanely high (it was 0 on all of my previous four drives):
Code:
220 Disk_Shift              0x0002   100   100   000    Old_age   Always       -       101974023

After few days it developed S.M.A.R.T. error related to the 7 Seek_Error_Rate attribute, at which point I've replaced it.
I received another one, manufactured in January 2021 - same story (even higher 220 Disk_Shift, 7 Seek_Error_Rate followed soon after but also ZFS started detecting checksum errors. Then I've replaced it again - got another one made in February 2021 - same story (even higher 220 Disk_Shift than the previous ones, ZFS started detecting checksum errors, but no 7 Seek_Error_Rate yet).
Are your disks connected to SATA ports or a LSI carg (SAS) ?
 

spitfire

Member
Joined
May 25, 2012
Messages
41
Are your disks connected to SATA ports or a LSI carg (SAS) ?
SATA ports. Like I said 4 other disks (same model, made before FEB2020) are working fine (connected to the same controller). I've tried connecting new ones to another controller, changing SATA cable, power cables, power supply..I'm definitely sure it's about them (and it seems like high Disk_Ashift - which may be pointing at an "electromechanical problems of the disk " - is root cause of the issue).
I've also seen a few topics here and there pointing at X300 and N300 drives having problems with
Seek_Error_Rate in the recent months.
 

Jasse Jansson

Member
Joined
Mar 19, 2017
Messages
71
Would be really interesting to know if this behaviour pops up if the disks are connected to a SAS controller.
 

li_gangyi

Newbie
Joined
Jun 10, 2021
Messages
1
Just registered an account to post here.
I experienced the same thing on one of the 2 N300 8tbs I bought.

After 8hr uptime and in total writing about ~50% capacity, 1 drive's seek error went below threshold (I think I saw 19 min).
I never did see the RAW value move away from 0, it was always the VALUE that changed.

Both drives were setup in Mirror (so should see the same workload).

Model HDWG180
FW 0603

The only difference between the 2 were that 1 (the one reporting seek error rates) was hooked up thru a Asmedia ASM1061 (just the way my motherboard works, 2 native SATA ports, 2 thru a PCIE-SATA).

What's interesting is when I unhooked the "bad" drive to another PC to wipe the drive before I returned it, the seek error rate slowly crept back to 100. I completed the wipe with it at 100 but it was still failing short SMART test (unknown error). I never did try the long test on it. Still 0 relocated or pending and it wiped at a reasonable speed. I returned the drive and am waiting for another to show up.

Meanwhile I've done short, long SMART tests, as well as a badblocks test on the remaining drive and it has been working ok (worst seek error rate 95).
 
Top