Port multipliers and large pools

Status
Not open for further replies.

RichR

Explorer
Joined
Oct 20, 2011
Messages
77
Hello,

So we've actually bought/built a Backblaze pod. So far so good, just getting adjusted to FreeNAS and ZFS. 45 x 3TB drives, 3 Volumes of 14 drives + 1 warm spare

2 immediate issues....

It seemed from BSD posts from 2009 that port multipliers were not a problem. I'm not fluent in reading the logs but this does not seem to be good....


Nov 15 08:01:33 freenas kernel: siisch2: Timeout on slot 30
Nov 15 08:01:33 freenas kernel: siisch2: siis_timeout is 00040000 ss 7ffffc00 rs 7ffffc00 es 00000000 sts 800c2000 serr 00000000
Nov 15 08:01:33 freenas kernel: siisch2: ... waiting for slots 3ffffc00
Nov 15 08:01:33 freenas kernel: siisch2: Timeout on slot 28
Nov 15 08:01:33 freenas kernel: siisch2: siis_timeout is 00040000 ss 7ffffc00 rs 7ffffc00 es 00000000 sts 800c2000 serr 00000000
Nov 15 08:01:33 freenas kernel: siisch2: ... waiting for slots 2ffffc00
Nov 15 08:01:33 freenas kernel: siisch2: Timeout on slot 27
Nov 15 08:01:33 freenas kernel: siisch2: siis_timeout is 00040000 ss 7ffffc00 rs 7ffffc00 es 00000000 sts 800c2000 serr 00000000
Nov 15 08:01:33 freenas kernel: siisch2: ... waiting for slots 27fffc00
.... and many more

what are the next questions I should ask regarding the above messages?

Also, I read in the ZFS best practices about limiting pools to 4-9 drives. Why? What would/could be the problem of 14 drive (+ the warm spare) pools?

more later, and thank you.....

Rich
 

louisk

Patron
Joined
Aug 10, 2011
Messages
441
I'm not familiar with port multipliers, but I can address the drive pool issue.

When rebuilding a ZFS vdev (RAID device), it will have to communicate with all other drives in the vdev. The more drives in the vdev, the longer the rebuild will take. There have been issues in the past when too many drives are in the vdev that the rebuild will take forever, it just never stops. The solution is to break things down into smaller pieces (you get better performance that way too, since the speed of a vdev is equal to the speed of a single spindle in the vdev). If you want to use raidz, I would suggest that your vdevs each have a number of spindles that is a power of 2 (4, 8). You could have 5 8 spindle raidz in the pool, and 5 spares (roughly one per raidz on average).
 

Milhouse

Guru
Joined
Jun 1, 2011
Messages
564
Also, I read in the ZFS best practices about limiting pools to 4-9 drives. Why? What would/could be the problem of 14 drive (+ the warm spare) pools?

Not pools, but vdevs. A pool can consist of multiple vdevs (group of physical disks) and as louisk explains, having multiple vdevs can increase your overall IOPs as ZFS can in essence spread the IO across the vdevs. More vdevs, with fewer disks, could result in a better performing system.

In addition, are your current 14+1 vdevs using RAIDZ1 or RAIDZ2? Hopefully it's the latter and not the former because even if you have a warm spare the risk of losing a second disk increases during the resilver, and with only RAIDZ1 protection you'd lose the entire vdev, and thus your entire pool even if it is comprised of multiple vdevs.

For a single vdev with up to 6 drives I'd recommend RAIDZ1, and then for up to 10 drives I'd recommend RAIDZ2. Since you have so many drives I would suggest being a bit more creative about how you partition your disks and vdevs to reduce the risk of simultaneous disk failure.

The five vdevs each consisting of 8+1 drives suggested by louisk might be pushing it for RAIDZ1, IMHO, but it's something to think about.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
It'd have been better had you outlined the specific hardware you're using. Is it the actual Backblaze-suggested parts (and if so, which version, IIRC they're a bit different between v1 and v2)? It's possible that you're having a controller compatibility issue, or if you look at the logs and can identify a particular channel or port that's experiencing trouble, address that.

It's uncommon to run into a build with a lot of drives that actually works 100% at first, especially if you're using a "whitebox" solution. My suggestion is to not rush the task. Take it one step at a time. Take one port, one group of drives, and test the heck out of it. My preferred method is to beat the $&#@^$& out of the drives with a bunch of dd commands to all ports simultaneously, and then be looking at "iostat daX daY daZ etc 1" for all the drives, and look for unexpected/unexplained blips, kernel messages, any other problems. If you cannot get one port working right, find out why. Try a different controller. Check Google for problem reports. Etc. If you care at all about your data, you're going to want to make sure that each pod is able to survive several hours of individual testing without any burps; when you've worked through each pod of drives, then you want to bang on the whole assembly for maybe a week or so.

I once built a giant machine with 72GB drives - 8 shelves of 9 each, 72 * 72 = 5TB of juicy storage back in 2000... I know that's not impressive *now* but it was big back then (and it took up a 40U rack!). It was finicky. It did work great once several marginal drives were located and eradicated, and that was back in the days when marginal electronics could really wreak havoc on a shared SCSI bus, but it can represent some work to find and correct those problems. These days, we're mostly lucky to have non-shared-bus drive topologies, but the lessons of thorough testing and debugging while on the bench are lessons that are worth remembering even today.
 
Status
Not open for further replies.
Top