Bad batch of Toshiba N300 8Tb 256Mb cache, might be motherboard failure

Jasse Jansson · Sep 22, 2020

This is about my recent upgrade from 4Tb WD Red's to Toshiba N300 8Tb 256Mb cache hard drives.

I have now a third disk within one month failing smart tests with this in my mailbox

Code:

FreeNAS @ amain.local

New alerts:
* Device: /dev/ada4, Failed SMART usage Attribute: 7 Seek_Error_Rate..

Current alerts:
* Device: /dev/ada4, FAILED SMART self-check. BACK UP DATA NOW!.
* Device: /dev/ada4, Failed SMART usage Attribute: 7 Seek_Error_Rate..

All of these are made in July 2020.
Can this be a bad batch from Toshiba or what ???

I have two that I replaced in February and they work fine. Those are the 128Mb cache model.
Did Toshiba make some other changes to those drives that messes up things ??

Jasse Jansson · Sep 22, 2020

Missed some information in my first post.

The failing Toshiba's with 256Mb cache are: HDWG180UZSVA

The server is based on a Supermicro x11ssh-f motherboard, i3 cpu and 32Gb ecc ram.

The first two failed hd's was ada3, this third is ada4 and the sata cables are of the locking kind.

HoneyBadger · Sep 22, 2020

What is the raw value for SEEK ERROR RATE?

Some vendors (Seagate) report a 48-bit RAW value, where the first 16-bits are "number of errors" and the last 32-bits are "number of seeks" so it's possible that the drive is fine.

Jasse Jansson · Sep 22, 2020

I'm not at home until next week but I'll check it then.

But the first disk I returned had a permanent SMART failure.

Last week I seached my collection of stuff in the basement and could build a computer I installed FreeBSD on, just for testing this.
I did some amaturish tests with smartctl and an old 4Tb WD red disk had no errors at all when I ran the LONG test.
The second Toshiba gave up the LONG test with 90% to go and just sat like that for at least 12 hours.

I can't be the only one who runs these drives in a NAS.

HoneyBadger · Sep 22, 2020

You could be the lucky one who's unearthing a firmware issue or something of the sort. Are you using an HBA or direct SATA?

Jasse Jansson · Sep 22, 2020

I'm booting from sata m.2 and have a raidZ1 pool on the motherboard's 8 sata ports.
This setup have worked fine for 2 years with a mix of 3Tb and 4Tb WD red's.

BTW, define lucky...

mswarren · Sep 30, 2020

I've just built my first FreeNas using 5x 8TB Toshiba N300 drives - FreeNAS-11.3-U4.1

All are reporting seek_error_rate failues, but the smart report looks ok to me:

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 100 100 050 Pre-fail Always - 0
2 Throughput_Performance 0x0005 100 100 050 Pre-fail Offline - 0
3 Spin_Up_Time 0x0027 100 100 001 Pre-fail Always - 7966
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 24
5 Reallocated_Sector_Ct 0x0033 100 100 050 Pre-fail Always - 0
7 Seek_Error_Rate 0x000b 001 001 050 Pre-fail Always FAILING_NOW 0

I don't see how worst is 001, and threshold is 050 this is a fail ?

Did you manage to solve this, or have you RMA’d the drives ?

Jasse Jansson · Oct 1, 2020

I have been in my cabil in the countryside for 2 weeks now and might stay for another one, so no RMA yet.
I'll RMA both HDD's when I get back home.
I have one previously RMA'd and one in the server that mailed me this after this weeks SMART LONG test

Code:

New alerts:
* Device: /dev/ada4, Self-Test Log error count increased from 0 to 1.

Current alerts:
* Device: /dev/ada4, FAILED SMART self-check. BACK UP DATA NOW!.
* Device: /dev/ada4, Failed SMART usage Attribute: 7 Seek_Error_Rate..
* Device: /dev/ada4, Self-Test Log error count increased from 0 to 1.

The error count increase message can't be good.
If I get my money back (I'm sure I will) then I think I buy Seagate drives instead.
Still strange that Toshina's bought in february works mighty fine. Maybe a bad batch.

Jasse Jansson · Oct 12, 2020

RMA'd the hard drive and bought a Seagate Ironwolf 8Tb instead of the Toshiba N300 drives.
Replaced ada3 and resilvered.
Now it don't want to recognize the new disk.
Well, it's a PC so I shut it down and restarted the server.
Now ada7 is totally gone but the pool works in a degraded state.
I only have 7 disks now.
I have all the data on my other 2 servers so everything is fine.

Can this be a motherboard failure ?
Better check to see if I still have valid warranty on it.

_gid · Mar 30, 2021

I replaced four WD Red drives in my RAID-Z2 home NAS box (Core i7.. Gigabyte mainboard, IIRC) with Toshiba N300 8TB drives from Amazon UK this month.

Three were from November 2020 (serial /^Y040.....AUG$/) and one was from, I think, July 2020 (serial /^70K0.....AUG$/). The first of the November drives started failing with Seek_Error_Rate problems within a day. Amazon support laughingly told me to turn off SMART error checking to solve the problem. I just did a straight Replacement return instead.

The replacement was another November drive. It failed too, as have the other two November drives. The July drive is okay and unfortunately the return period for that drive has just expired... otherwise I'd replace it too just in case.

I didn't do any proper testing, but it was mighty weird the July drive's fine but all the November drives are making FreeNAS rather unhappy.

I'm in the slow process of replacing the drives one-by-one with WD Red Plus 8TB drives, and sending the Toshes back one-by-one for refund in the WD packaging! I'll keep an eye on it, and if that July drive starts flagging errors too, I'll cut my losses and replace that with a WD Red as well.

Jasse Jansson · Mar 30, 2021

Well I gave in from the group pressure in this forum and bought a LSI card with the right firmware from ebay and have had no problems since then.
I suspect some strangeness from the way SATA ports are handled but I have no evidence to back it up.
One thing that supports my stangeness theory is that a Seagate drive also gave me the same SMART error.
I also replaced one of my FreeNAS servers with a Synology NAS (DS1821+) and it runs fine as my main server, with a FreeNAS server that handled weekly backups.

spitfire · Apr 5, 2021

I have bought 4 of these back in February 2020. These are still working well.
I've ordered another one in March (received one manufactured in SEP-2020) and from the start it was emitting a clicking noise.
I rewired everything, changed data cable, even the power supply. It turned out the value of a S.M.A.R.T. attribute 220 Disk_Shift was insanely high (it was 0 on all of my previous four drives):

Code:

220 Disk_Shift              0x0002   100   100   000    Old_age   Always       -       101974023

After few days it developed S.M.A.R.T. error related to the 7 Seek_Error_Rate attribute, at which point I've replaced it.
I received another one, manufactured in January 2021 - same story (even higher 220 Disk_Shift, 7 Seek_Error_Rate followed soon after but also ZFS started detecting checksum errors. Then I've replaced it again - got another one made in February 2021 - same story (even higher 220 Disk_Shift than the previous ones, ZFS started detecting checksum errors, but no 7 Seek_Error_Rate yet).

Jasse Jansson · Apr 5, 2021

Does your disk

spitfire said:
I have bought 4 of these back in February 2020. These are still working well.
I've ordered another one in March (received one manufactured in SEP-2020) and from the start it was emitting a clicking noise.
I rewired everything, changed data cable, even the power supply. It turned out the value of a S.M.A.R.T. attribute 220 Disk_Shift was insanely high (it was 0 on all of my previous four drives):

Code:
220 Disk_Shift 0x0002 100 100 000 Old_age Always - 101974023

After few days it developed S.M.A.R.T. error related to the 7 Seek_Error_Rate attribute, at which point I've replaced it.
I received another one, manufactured in January 2021 - same story (even higher 220 Disk_Shift, 7 Seek_Error_Rate followed soon after but also ZFS started detecting checksum errors. Then I've replaced it again - got another one made in February 2021 - same story (even higher 220 Disk_Shift than the previous ones, ZFS started detecting checksum errors, but no 7 Seek_Error_Rate yet).

Are your disks connected to SATA ports or a LSI carg (SAS) ?

spitfire · Apr 5, 2021

Jasse Jansson said:
Are your disks connected to SATA ports or a LSI carg (SAS) ?

SATA ports. Like I said 4 other disks (same model, made before FEB2020) are working fine (connected to the same controller). I've tried connecting new ones to another controller, changing SATA cable, power cables, power supply..I'm definitely sure it's about them (and it seems like high Disk_Ashift - which may be pointing at an "electromechanical problems of the disk " - is root cause of the issue).
I've also seen a few topics here and there pointing at X300 and N300 drives having problems with
Seek_Error_Rate in the recent months.

Jasse Jansson · Apr 10, 2021

Would be really interesting to know if this behaviour pops up if the disks are connected to a SAS controller.

spitfire · Apr 10, 2021

Jasse Jansson said:
Would be really interesting to know if this behaviour pops up if the disks are connected to a SAS controller.

I have another batch (4 drives, bought in Feb 2020) of the same model working on the same system/controller, no issues

li_gangyi · Jun 10, 2021

Just registered an account to post here.
I experienced the same thing on one of the 2 N300 8tbs I bought.

After 8hr uptime and in total writing about ~50% capacity, 1 drive's seek error went below threshold (I think I saw 19 min).
I never did see the RAW value move away from 0, it was always the VALUE that changed.

Both drives were setup in Mirror (so should see the same workload).

Model HDWG180
FW 0603

The only difference between the 2 were that 1 (the one reporting seek error rates) was hooked up thru a Asmedia ASM1061 (just the way my motherboard works, 2 native SATA ports, 2 thru a PCIE-SATA).

What's interesting is when I unhooked the "bad" drive to another PC to wipe the drive before I returned it, the seek error rate slowly crept back to 100. I completed the wipe with it at 100 but it was still failing short SMART test (unknown error). I never did try the long test on it. Still 0 relocated or pending and it wiped at a reasonable speed. I returned the drive and am waiting for another to show up.

Meanwhile I've done short, long SMART tests, as well as a badblocks test on the remaining drive and it has been working ok (worst seek error rate 95).

0xE1 · Apr 30, 2022

I have already RMAd 3 out of 6 8TB N300 Gold (HDWG480) disks that I bough, and now new replacement disk is showing me 0x868 reallocated sectors already and has 0 read/write performance (it doesn't actually write to disk, fills the cache and that's it, 0 reads/writes to the platters)

Important Announcement for the TrueNAS Community.

Bad batch of Toshiba N300 8Tb 256Mb cache, might be motherboard failure

Jasse Jansson

Explorer

Jasse Jansson

Explorer

HoneyBadger

actually does care

Jasse Jansson

Explorer

HoneyBadger

actually does care

Jasse Jansson

Explorer

mswarren

Cadet

Jasse Jansson

Explorer

Jasse Jansson

Explorer

_gid

Cadet

Jasse Jansson

Explorer

spitfire

Dabbler

Jasse Jansson

Explorer

spitfire

Dabbler

Jasse Jansson

Explorer

spitfire

Dabbler

li_gangyi

Cadet

0xE1

Cadet

Similar threads

Important Announcement for the TrueNAS Community.

Bad batch of Toshiba N300 8Tb 256Mb cache, might be motherboard failure

Explorer

Explorer

actually does care

Explorer

actually does care

Explorer

Cadet

Explorer

Explorer

Cadet

Explorer

Dabbler

Explorer

Dabbler

Explorer

Dabbler

Cadet

Cadet

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Bad batch of Toshiba N300 8Tb 256Mb cache, might be motherboard failure"

Similar threads