Older firmware for LSI 9305-16i

Octopuss · Jan 14, 2023

What do you mean by the virtual disks in practical terms? What storage exactly are you talking about? Like the data one stores on the NAS, I guess?

artlessknave · Jan 14, 2023

the grinch is the VM master. I tried it once, on esxi, with my backup server....I hated it, though part of that was because it was an experimental esxi server, so any time I needed to reboot it, I had to stop all my replications (which at the time took about 8 clicks in the interface for each one).

jgreco · Jan 14, 2023

Octopuss said:
What do you mean by the virtual disks in practical terms?

A virtual disk is some sort of disk abstraction that is stored by your hypervisor somehow on real physical storage; on ESXi, for example, that would be a ".vmdk" file (or set of files) usually stored on a VMFS6 or NFS filesystem.

The problem comes in because of this:

Let's say you have a basic hypervisor, and it has a HDD on it, and you build a virtual disk on it for a FreeBSD VM. You then install a FAMP stack on it to act as a web server. Years later, the disk gives out, with a massive run of disk errors. You wake up to find your website down, the VM is stalled with a bunch of "Read error". This has two subforks, one is where the mpt driver is timing out with the hack that mav@ did some years back, and the other is where the VM is actually stalled by the hypervisor, which can especially happen when something bad happens like running out of disk space on the datastore. In both cases, the VM functionally stops working. Most people, upon hearing this, say "Of course."

But now think about ZFS. You have a mirror or RAIDZ vdev, made up of some vmdk's. One day an underlying datastore blows its brains out somehow. Disk full, disk errors, whatever. What happens to the NAS VM? If you don't say "it stalls of course" then you need to re-read the paragraph above. Because the NAS is "just another VM" and will stall "just like any other VM". There are a few ways to mitigate this, such as by using a datastore that has redundancy, but really it is best just to give TrueNAS the raw access to the disk controller that it really wants and needs.

Octopuss · Jan 14, 2023

That makes sense, but even with my borderline zero knowledge about this stuff, I somehow feel like using virtual disks for NAS storage is a really bad idea. After all, that's why we use HBAs and physical disks, isn't it?
I am basically commenting on the 2nd half of your post.

jgreco · Jan 14, 2023

Octopuss said:
I somehow feel like using virtual disks for NAS storage is a really bad idea.

It depends. You can absolutely address the major issues if you try, but at the end of the day, yes, it gives me the heebie jeebies.

artlessknave · Jan 14, 2023

Octopuss said:
that's why we use HBAs and physical disks

yes, but some of the problems with virtualisation will still be problems as long as you have that abstraction layer present.

for example, sometimes it seems like using TrueNAS for VM storage is a good idea (which it usually is), and someone will try to put their TrueNAS VM's storage on their virtualized TrueNAS....which is a circular redundancy; the TrueNAS needs to be online before the TrueNAS can be online, but the TruenNAS isn't online so the TrueNAS cannot be brought online...

going back to the original questions, knowing how YOUR virutalization is setup can at least let someone, like Jgreco, verify that it should work as is, or can tell you where the problems are and what you need to fix to have it working correctly. this is basically the same as with hardware, except now you have added a house of cards on top of the hardware holding up your VM.

normally I would note that RAIDz1 is less than ideal, but it looks like you have it with SSDs which should be far less of a problem. you will still have no redundancy for a resilver, but that resilver should be pretty fast. also, 2TB SSDs

Octopuss · Jan 15, 2023

What's wrong with RAIDz1? I made it like that, because the chance of more than one disk dying at once is close to zero, or at least I believe so.

jgreco · Jan 15, 2023

Octopuss said:
What's wrong with RAIDz1? I made it like that, because the chance of more than one disk dying at once is close to zero, or at least I believe so.

So what happens when one of your disks dies, and then in the process of repairing it, you inadvertently yank the wrong disk, or a cable is finicky, or there's a disk read error on one of the other disks? With RAIDZ1, your redundancy is lost when the one disk fails, and any other problems are potentially pool-killers. Will they? Who knows. But it is much safer to retain the redundancy property.

Ericloewe · Jan 15, 2023

More than RAIDZ1 being bad, think of it as not being enough for the typical crowd around here who cares about their data and prefers RAIDZ2.

artlessknave · Jan 15, 2023

Ericloewe said:
More than RAIDZ1 being bad, think of it as not being enough for the typical crowd around here who cares about their data and prefers RAIDZ2.

it's not bad, it's just the wrong tool for the job most of the time, expecially since many people use raidz as their backup plan, so when the raidz fails, poof! data is gone.

if you have a backup, and are aware of the risks? sure.
as a scatch pool for something like video compression, authoring, etc? sure.
on SSDs, where the rebuild time is dramatically faster? probably fine. should still have a backup of anything important though.
as the only copy of valuable data on spinner disks? no.

Octopuss · Jan 18, 2023

Funnily enough, the person who sold me the HBA and whom I returned it to has just told me it doesn't detect any drives for him either, so it might not had been a cable problem after all.

artlessknave · Jan 18, 2023

Octopuss said:
doesn't detect any drives for him either

that is a huge shame, it's a good card, if it was working.

WI_Hedgehog · Jan 19, 2023

Octopuss said:
What's wrong with RAIDz1? I made it like that, because the chance of more than one disk dying at once is close to zero, or at least I believe so.

After getting a for-test-purposes TrueNAS system working I bought more drives to set up RAIDz3. After you've been through the "lost data" ringer enough times (and if there's enough data) Z3 is cheap insurance.

If all your disks are about the same age they'll die "about the same time." Errors on an unused sector aren't detected until you're trying to read from that sector, like during a RAID rebuild, which is also hard on the disks and heats them up beyond what they've previously seen, making previously unknown errors show up. smartctl -t=long may not find errors that are just under the surface...I had a drive test "OK" with only a few remapped sectors, but a badblocks -wp 2 found 18,000+ errors on the drive (it's still climbing). I *think* they may have been correctable at this point, I'm not sure and will look when the test completes. That's probably an extreme example, but it happens, hence most people (from what I read, anyway) settling on Z2, and in my case Z3.

Anyway, as has been mentioned we are here to help, some patience is appreciated because this isn't home-gamer "it's close enough" land, it's more like The Perfectionist Zone with a good slathering of reality. (IMHO)

artlessknave · Jan 19, 2023

WI_Hedgehog said:
Errors on an unused sector aren't detected until you're trying to read from that sector, like during a RAID rebuild,

this is why ZFS scrubs exist. they read and verify data periodically. RAID does not have this.

WI_Hedgehog said:
heats them up beyond what they've previously seen

this is not really likely. any scrub will use the disk about the same. the difference is a scrub isn't trying to rebuild from parity, merely checking that checksums match.

WI_Hedgehog said:
The Perfectionist Zone with a good slathering of reality

Perfect Realism(TM)

WI_Hedgehog · Jan 20, 2023

Exactly right on drive scrubbing finding issues before something like a drive rebuild is needed, which is a bad time to find corrupt data.

artlessknave said:
[drive heating] this is not really likely. any scrub will use the disk about the same. the difference is a scrub isn't trying to rebuild from parity, merely checking that checksums match.

I wrote a script to log drive temperatures because:
A.) I had a fan go out and drive temps went from 37C to 48C, causing one drive that was on the verge of failing to throw errors in a spectacular fashion (found with badblocks -n).
B.) Some systems are located in unstable environments with large temperature fluctuations.

Logging showed the more a drive is used the more it heats up, which is pretty dramatic with SSDs.

Code:

#!/bin/bash
# record S.M.A.R.T. drive temperature

outfile="/mnt/log/`date +%F_%H%M%S`_drvtemp.txt"

echo "S.M.A.R.T. temperature information for drives:"
for drive in dev/sd?
do
    echo "$drive : `smartctl --xall "$drive" | grep 'Current' | grep 'Temperature'`" | tee --append "$outfile"
done
echo "Log file saved. Done."

artlessknave · Jan 20, 2023

WI_Hedgehog said:
I had a fan go out and drive temps went from 37C to 48C

WI_Hedgehog said:
large temperature fluctuations

WI_Hedgehog said:
drive is used the more it heats up

yes, drives heat up. my point was that reading drives for scrubs and reading drives for resilver and writing to drives are going to heat the drive up about the same, counter to how you said that reading a drive for resilver somehow heats them up more than they have ever seen, which is just not true.

WI_Hedgehog said:
heats them up beyond what they've previously seen

if you have poor cooling, obviously everything is going to be hot, but the amount of heat will be relatively consistent (unless you loose a fan, but that changes the cooling profile, not the disk heat generated)

aditionally, SSD and HDD are very different. SSD heat up more but as they have no physically moving parts, this doesn't matter as much, plus, they tend to read and write so much faster that the whole performance profile is very different. RAIDz1, for example, is generally much less risky on SSD, since the rebuild is fast due to both smaller drive sizes and dramatically faster read/writes.

Octopuss · Jan 27, 2023

New card arrived and it works like a charm. Doesn't even need molex power connectors to power the SSDs up.

jgreco · Jan 27, 2023

Let us know how it goes! It's unusual to have such major problems with an HBA and I think you'll enjoy your new one.

Octopuss · Jan 27, 2023

Well, it just works. Like the original card was supposed to, heh.

Important Announcement for the TrueNAS Community.

Older firmware for LSI 9305-16i

Patron

Wizard

Resident Grinch

Patron

Resident Grinch

Wizard

Patron

Resident Grinch

Server Wrangler

Wizard

Patron

Wizard

Guru

Wizard

Guru

Wizard

Patron

Resident Grinch

Patron

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Older firmware for LSI 9305-16i"

Similar threads