SIEM POC Speed Required.

Status
Not open for further replies.

Nomad

Contributor
Joined
Oct 14, 2013
Messages
125
Hey Everyone,

I have a very old EMC VNX5300 which just isn't cutting it in today's Security world for speed. Firewall reports are taking 12+ hours for one day.

I'm looking to build a POC (Proof of concept) for ZFS/ pure SSD for event reporting speed.

The dataset I'll be working with is the following:
  • 3TB compressed log files or about 22TB raw (however they are stored compressed by the SIEM; no way around it)
  • 96,475,061,187 events in total. Avg 1000 EPS (Events/Sec)
  • 1 year of data.
  • If it matters I can get the filesize per day.
The Processing will be handled by the database server (so whatever the query Users who accessed freenas.com for example) and Serving of the data over 10Gb network will be the Freenas.

Keeping in mind the size of the dataset I'll be working with can anyone give me recommendations on building out a POC.

With a pureSSD environment will I need zill/arc? I've never used them before but from what I understand I won't.

Looking at a Xeon E3-1200v5 (1225), 16GB Ram, SAMSUNG PM863 MZ-7LM3T8E 2.5" 3.84TB x 6 (will be expanded to 12 if POC is accepted).

Raid card? No Raid card? Which board x10? x11? Something else? I'm hoping the speed is going to be 10x faster but have no way to tell until it's build.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
I think you're way off.

The E3-1225 v5 is a workstation processor, which means it has video, which is totally useless. 16GB of RAM is relatively low; the rule of thumb for serious use is 1GB of RAM per 1TB of space. 12 * 4TB = 48TB; 64GB RAM would be a better choice.

You can manage that with the E3, but just barely. You might be better served to go with a heavy hitter such as the E5-1650 v3 and 4 x 16GB RDIMM on a board such as the X10SRH-CF, with a SC216BE1C-R920LPB chassis. This gives you up to 26 2.5" drives in a platform that should run something like $2800. Add a Chelsio 10G card for ... what are they, about $600, I want to say, and some SATA DOM's, and you've got a $3500 NAS chassis that is super expandable and super flexible.

Then, instead of the PM863 4TB SSD, which is a two THOUSAND dollar part, maybe look at something else. I'm not sure where this part selection is coming from. Are you actually storing only the 3TB of compressed data? Because you could and probably should go with something smaller for SSD in that case.

How did you arrive at the 96B events? 1000 events/sec * 86400 sec/day * 365 days/year ~= 32B.

So I'm unclear on how much data you want the filer to store, and how much it is writing per day would be useful as well. With a 24 drive system there's a possibility that there are less expensive SSD options that would be adequate to the task. For example, the 850 EVO 2TB has an endurance rating of 300TBW, but only costs around $600/ea. 24 of those plus 2 warm spare would be $15K and would give you a mirror vdev pool of 24TB. It's unclear on where the performance falloff might start for fragmentation, but even if we were to apply the 50% rule and say only 12TB of that was usable, that's a lot of storage for 3TB. :) Your 12 PM863's would be $25K in comparison.

I should also note that this is all somewhat hypothetical. We're not seeing people build SSD pools this big, though I do have a small SSD pool on our E5-1650 based VM filer, and it's reasonably nice.
 

Nomad

Contributor
Joined
Oct 14, 2013
Messages
125
Hey Jgreco,

First starters thank you so much for taking the time to respond; any "objections" i bring up are merely me understanding and learning.

The POC is just for 3TB compressed. 6x4TB Z2 yeilds 16GB usable which is where I came up with 16GB for ram.
If we get a GoLive we are looking at 506323.0463 RAW or 43TB Compressed which currently isn't enough but if we only keep a year on the SSD it will be.
There are two sites so I was going to split the drives up. 12x4=40TB usable at each site. I'm guessing you are saying it would be ok to do 24x2TB in Z2? I just worry about that many disks in one pool even with Z3.

It's really 1028.19 events/sec and 3 years of data. ~96B
GoLive totals are as follows (Current) and I'd like to double that for future growth.
Events: 1.12747E+12
EPS: 12K
506TB Raw / 43TB Compressed

The Price isn't a factor compared to the near 1M that EMC wants for a pure SSD array. Even if we bought 52@2k the 4TB drives would still be cheaper.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
I have no idea what a "GoLive" is. Google with that in relation to firewall doesn't suggest much.

I also have no idea what sort of workload this is. Is it merely storing file data sequentially and you're actually searching it linearly, or is it storing a database and you're working with that?

RAIDZ2 is probably fine for sequentially stored data, but going too wide is a bad idea. Mirrors are better for database-style access with lots of random reads, writes, and updates. In both cases, be aware that ZFS performance degrades as the system gets fuller. This is worse with HDD than with SSD, but is still a factor for SSD. Plan to leave a significant (50% to start) pool space free, and then if performance remains stable and pleasant as time passes, you can experiment with reducing that.

For a POC, you would want to experiment with 6 disk RAIDZ2 and 12 disk RAIDZ2 to see what's acceptable. Note, again, that this is only applicable if your firewall or whatever is merely writing sequential files. All bets are off for database style access, and RAIDZ2 probably will be miserable for that.

Again, you're in unknown territory here. We've got a similar filer here with HDD and a small SSD pool. It's reasonably pleasant to use and would be totally awesome with SSD, I'm pretty sure, but I just can't justify funding two dozen massive SSD's to find out.
 

Nomad

Contributor
Joined
Oct 14, 2013
Messages
125
This isn't a database like you would think about one.

Log source -> collector/compressor -> index to database server and compressed logs to nas.

Database server uses metadata and index to go to x day on y log file. Requests to NAS Reads and pulls out what it needs.

GoLive means after the proof-of-concept is complete if it is accepted then we build out the full system.

Think i should hit up someone from ixsystem to see if they can recommend.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
I don't care if it's a database "like I would think of one" (especially as I would think of one very differently due to the kinds of work I've done, most of which involve doing very specialized non-database-y databases). What matters is whether or not the access patterns are long sequential linear accesses, or constant short little writes here and there (constant short little reads can be accommodated under both mirrors and RAIDZ).
 

Nomad

Contributor
Joined
Oct 14, 2013
Messages
125
Well I hope my explanation of the database was clear. If anyone knows emc I can pull the stats from the current vnx.
 

depasseg

FreeNAS Replicant
Joined
Sep 16, 2014
Messages
2,874
So I think the biggest unknown is the query profile. I mean, 12k EPS (at 1KB/event (which is probably generous)) is 12MB/s right? That shouldn't be that hard, and then the expansion is just adding multiple of ~10 drive vdevs to get the capacity you need. I think the biggest unknown is the read/query profile and what that will do to the array performance wise. Any chance you've got the IOPS required under ingest only, as well as operator usage?
 
Status
Not open for further replies.
Top