SATA hot plug support on MSI E350IA-E44

Status
Not open for further replies.

jnitis

Dabbler
Joined
Aug 12, 2012
Messages
12
Greetings folks,

Three questions, all directly related.

Question 1: I have an MSI E350IA-E44 motherboard in a Lian Li PC-Q25 case running Seagate NAS drives all connected to the backplane included in the case and am having an issue with SATA hot plug support. I have the SATA ports set to AHCI in the BIOS.

What seems to happen is when I attempt to add/remove a drive in one bay one of the other drives gets detached even though the new drive is recognized. The motherboard has 4 SATA 6Gbps ports and in the example that follows I had drives in bays 0, 2, and 3 (herein bays 0-3 refer to SATA ports 0-3 as well) all working fine. I then inserted a fresh drive into bay 1 and this is what I saw:

Code:
[root@freenas] ~# dmesg
ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
ada0: <ST3000DM001-9YN166 CC4H> s/n Z1F0KKMH detached
ada3 at ahcich1 bus 0 scbus1 target 0 lun 0
ada3: <ST3000VN000-1H4167 SC43> ATA-9 SATA 3.x device
ada3: Serial Number Z3101Y7C
ada3: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
ada3: Command Queueing enabled
ada3: 2861588MB (5860533168 512 byte sectors: 16H 63S/T 16383C)
ada3: Previously was known as ad6


So it detached my perfectly fine ada0 and then added the new ada3 drive. After this camcontrol devlist, smartctl, and geom commands all showed all 4 drives fine but zpool status showed ada0 as REMOVED. I tried offlining/onlining ada0 to no avail, no matter what I did it wouldn't come back online.

I then tried re-seating ada0 (remember, this is bay 0 and was working just fine prior to insertion of a fresh drive into bay 1) and received this:

Code:
cam_periph_alloc: attempt to re-allocate valid device ada0 rejected flags 0x118 refcount 2
adaasync: Unable to attach to new device due to status 0x6


I did some reading and found it may simply mean ada0 was in use by a process (perhaps ZFS itself) so I offlined the drive (so it showed OFFLINE instead of REMOVED) and then tried try re-seating the drive again and this time not only did the above error repeat but another one of my previously working drives detached!

At that point my zpool with a single RAIDZ2 vdev went offline since it was already down to 3 disks at the start of all this, then the working-fine ada0 went into REMOVED during the first drive insertion above so I was down to two drives, and then upon re-seat another working drive failed leaving the pool/vdev with a single working drive so yeah, duh, it failed.

So next I rebooted (both cold and warm starts) and tried checking what was showing up in BIOS. No matter what I did I couldn't get any drive to show up in BIOS in bay 1 (the original bay I inserted a fresh drive into above). I tried multiple drives in bay 1 and none of them worked but they worked fine in bay 0.

Finally I decided to try completely disconnecting the ATX power supply (remember I already tried both cold and warm restarts) and wiggling the SATA data cables and then finally drives were recognized in bay 1 again. When I booted up all three of my working drives came online and I was able to start the resilvering process with the fourth brand new drive in bay 1.

My gut feeling is something somewhere in my HW build doesn't support hot plug SATA properly, but it seems odd that the new drive is recognized when it's inserted but at times (not every time, though) upon a drive insertion in one bay another drive will get detached (but still show up in OS commands as working just fine, except ZFS will refuse to use it). The drives support hot plug, the case backplane supports hot plug (it's just a simple backplane offering a physical connection mechanism, no electronics involved), and the chipset (AMD Hudson M1 aka A50M FCH) supports hot plug. The only thing I can think of is that the motherboard specification doesn't list hot plug support so I'm wondering if it's possible to have a chipset that supports hot plug but a motherboard that doesn't implement it. From the AMD manual in the description of the pin-outs:

Note: Each port has a pin (SATA_IS) for sensing the status of the external interlock switch. If the motherboard implements SATA interlock switches, it should connect the statuses of the switches to those pins. The FCH can sense such statuses and, when they change, generate a PME or interrupt. Normally, an inter-lock switch is required for supporting hot plug.

What do you guys think?

Question 2: An amazing note: any drive I insert via hot plug gets recognized as a 600MB/sec drive whereas if you refer to my other recent thread the very same drives get recognized as 300MB/sec drives upon a cold/warm boot and stay that way forever. Is this indicative of an issue with FreeBSD or my HW do you think?

Question 3: Also, here's a related question. If I have drives in bays 0-3 they show up as ada0-3 in FreeNAS. If I shut down, remove say the drive from bay 1, then boot into FreeNAS again instead of the drives showing up as ada0 (then skipping ada1) ada2, ada3 they show up as: ada0, ada1, ada2. Is this normal? I guess the GUID and/or some other ZFS drive identifier works around this but just wanted to ask in case it's related to my issue above.

Thanks, folks.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Q1: AMD based desktop boards often have SATA controllers added that are far from ideal for a server. It sounds like you are seeing first hand that they just aren't fit for servers. For your own data's sake you should consider something more appropriate for a server. If you read our stickies you'd see that the only hardware we recommend is actual server grade. Otherwise you risk potentially pool-killing corruption due to a crappy SATA controller.

Q2: No way to know. I never recommend AMD based systems because of #1 above and the fact that they statistically are more likely to have problems with FreeBSD/FreeNAS than an Intel solution. It could be hardware, it might not be. I will say I've never heard someone complain about this before, so you're definitely in the minority. Of course, you are already in the minority by virtue of using Desktop hardware and the fact you went with AMD(either one puts you in the minority by a long shot).

Q3: Yep. The disks are assigned device names by the order they are detected. Empty ports are empty.. so they have no device.
 

jnitis

Dabbler
Joined
Aug 12, 2012
Messages
12
Thanks for the reply, CJ.

I'd like to note that I performed extensive research on what HW to purchase including here in these forums in 2011 when I first built my FreeNAS box. At the time the E350/E450 CPUs were hot on the market (2011) and were selling like hotcakes in all manner of equipment, yes mostly aimed at the consumer due to the price/performance ratio. That being said many here in these forums and elsewhere on the 'net highly recommended E350/E450 based motherboards as an ideal solution for FreeNAS (among other appliance uses): just enough CPU power to come close to saturating a gigabit link with samba (and easily saturating it with NFS/FTP/etc.), tiny/embedded design (most designs were mini-ITX, including this one), and a very reasonable cost. Furthermore the Southbridge including the SATA controllers that (AFAIK) is always included on associated motherboards with E350/E450 designs is also made by AMD and fully supported by FreeBSD. You're correct in that sometimes motherboard manufacturers add additional SATA ports with less-than-ideal SATA solutions, but that's not the case here.

I don't see how being in a minority would in any way affect the function of the hardware at hand nor attempts at troubleshooting to isolate root cause. That's like saying the Toyota Camry is #1 in sales in the US so the #2 Honda Accord (or any other automobile) being in the minority will face issues operating as designed, to specification, on public roads under normal use as specified by the manufacturer. If you'd like to continue down that road (no pun intended) please provide links to hard data supporting your claims from an industry accepted independent and unbiased party.

I also don't see how maintaining a strong distinction between "consumer grade" and "enterprise grade" comes into play here as long as one is asking the HW to function as designed and advertised by the manufacturer. We're not talking IPMI, separate lights-out management consoles, or anything of the like here. I'm asking "chipset supports hot plug, mobo product website doesn't mention it, could this be an issue?" while providing detailed data on the failure states I saw to hopefully assist someone in addressing the root cause and obtaining a definitive answer.

Now, leaving the brand and class of HW out of the picture what can I do to further troubleshoot my Q1&2?

Thank you for answering Q3 concisely.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Back in 2011(which was before my time here) there was a lot less knowledge and experience on the forums. All walks of life and all hardware was used back then. It hasn't really been until the last year or so that we've seen ECC RAM become something that deserved serious attention(just look at the date of my ECC vs non-ECC RAM thread). This has been a learning process for many of us. I didn't even show up here in the forums until 2012, and there wasn't anyone that was a daily reader and writer for the forums.

As for AMD's "fully supported" on FreeBSD, I disagree. 10 years ago AMD really was giving Intel a run for their money. AMD started investing heavily in software development for *nix. But, in the last few years AMD has had to cut costs. The best way is to do layoffs. Their support for *nix has dropped significantly in the last 2 years. I think last year or the year before they laid off 1/2 of their developers. When you lay off 1/2 of your developers that were handling open source projects, that's not good for said projects. I can't tell you how many AMD systems won't even complete a bootup sequence without a kernel panic(and sometimes not even getting to loading the kernel!) or similar problem. It's a problem that's plaguing more and more users every day. I'm all about people buying the cheapest that will get their job done. But, AMD is proving to be harder and harder to shop for without having to turn around and buy a whole Intel system just to get a working FreeNAS box. You can't tell me you are saving money when you build an AMD system only to find out you have to go buy an Intel just so it will boot. (I do realize this isn't your problem, I'm just explaining why I disagree.) Look at it this way, Intel spends 10 times as much on R&D as AMD does every year. It's a little hard for AMD to compete when Intel spends as much money every year on R&D as the entire value of the AMD company. Intel has almost doubled their R&D expenses in the last 5 years, AMD's has been a slow and steady decline.

Another problem with AMD systems is their market. Most people go with AMD because they want something cheaper than Intel. Well, you get exactly what you pay for when it comes to electronics. Intel has a fairly strict QA process, especially in relation to other manufacturers. AMD hardware is typically cheaper when sold to motherboard companies, who take that AMD hardware and add on a bunch of other relatively cheap hardware to a board. Then they build a BIOS(being mindful of cost too) and dump that thing on the market. Now you are stuck with a motherboard that was made from relative cheap parts from manufacturer's that are out to make the cheapest components, probably didn't spend as much on QA as some of the more expensive manufacturers, may or may not pay to have open source drivers provided, but you were potentially suckered in with the cheap price. Well, now you own it and you are stick with those odds. After all, nobody buys an ultra-expensive AMD motherboard so they can couple it with an AMD processor and bask in their glory as their system underperforms an equivalently priced Intel based system. You buy AMD *because* you are saving money and the performance you are getting with that cheaper price tag makes it a cost effective choice.

Now, I'm not an AMD fan myself. Every AMD system I've ever owned has been a disgrace, with the last on I used for less than 50 hours before putting it in a corner and never using it again. But, I don't want to see AMD fail either. I think AMD needs to compete with Intel. Without competition Intel won't have the motivation to continue to innovate.

Being in the minority is not about hardware function. What it does do though is put you on an island that only a few people will be on, so support will be scant. I don't consider your Toyota vs Honda analogy to be entirely accurate when Intel is just dominating AMD like they are. I'd almost call Intel a monopoly. AMD really isn't providing much of a challenge for Intel, and with AMD's layoffs last year some people called it "an executive gutting a company to maximize short-term profits before he leaves with his loot". One thing you just don't do in AMDs situation is start laying off developers and engineers, and that's *exactly* what AMD did.

As for your hardware functioning as designed, that's a combination of both software and hardware. Without both good hardware and good software you'll have a very unsatisfying experience. If the drivers are total crap, you can expect the hardware to perform poorly or not at all. Look at Realtek for a perfect example. Realteks can work pretty well on Windows, so clearly the hardware isn't total crap(yes, it's not great, but it's clearly reliable). But look at the percentage of people that can't get it to connect to a LAN, can't get it to stay connected for 30 seconds, have bouts of random lost packets, etc. that all go away the second they get an Intel NIC. Clearly software is the problem there. But, Realtek really doesn't do crap for the *nix OS development, so you get some hacked version that works for some people, but not for many(or most).

Lastly, hotswap requires both your hardware and its drivers to properly support hotswap. We've seen plenty of hardware that has no hotswap support in FreeBSD drivers at all. So despite the fact that the manual says it support hotswap, it's more like "hardware supports hotswap on Windows only". You may or may not be in that case. You are kind of on that "island" I mentioned earlier, so I don't have any experience that will give you a solid answer. I will say that I am truly freaked that one drive being unplugged offline's a second disk. That's totally, TOTALLY unacceptable for proper server operation. That is something that should be freaking you out to the extreme. In essence, if one disk fails and drops off your server a second disk be offlined as a result. That's just not acceptable in any definition of a "reliable file server". I'd be scared for my zpool in your shoes. I think it's something that definitely requires attention. The easiest solution in my opinion is to go and get an M1015 controller and then you can totally avoid any SATA problems from the motherboard.
 

jnitis

Dabbler
Joined
Aug 12, 2012
Messages
12
Woah, I almost thought I was reading an old thread from comp.sys.amd.advocacy. :)

My focus here is to resolve my issue in the best way possible which means some combination of cost effective, lowest investment in time and pain, and general simplicity. I have a mind that naturally does not rest until a problem is resolved properly by getting at the absolute root cause and smells nonsense from a mile away and in this case my issue has piqued both of those traits.

Now before I move on to updating everyone on my most recent travails I'd like to state that I believe I understand and respect your opinion much better now that you've shared some background and your thought processes behind it. The reason I took issue with it and bothered to address it at the expense of this thread going wildly off-topic is many-fold but primarily for two reasons.

The first reason is like 99% of the rest of the Internet I read a lot more than I post and as such in the past I've come across what I perceive as an unfair strong anti-AMD stance from a rather influential poster on these forums. I didn't feel compelled to enter into an exchange as such when just browsing but when you replied to my issue re-iterating that stance I felt compelled to shed some light on what I feel is a strong counterpoint from my perspective.

The second reason is from my perspective I feel a "it's the HW you bought stupid, deal with it" (which I'll allow is my perception, but might not have been your intent) isn't very helpful especially when someone has already invested in the build and the build may have been working just fine for quite literally years. You're absolutely entitled to your opinions and there's no way I'm going to invest enough time on these forums to act as a full counterweight to your opinions but my ask (and it's just an ask, feel free to totally ignore it) is that as an influential member of this community you try to be as objective as possible when it comes to issues of HW branding and classification when helping people.

Now on to the issue at hand. I opened a case with MSI to which they quickly replied and they provided me with an internal BIOS release that appears to have resolved the issue of other HDs going offline when one is removed. So that's fixed, yay. (Why they don't post this publicly I don't know, I will strongly encourage them to.) That being said I believe some issues still remain. Whether they are HW or SW related (or both) is unknown. They are as follows.

Issue #1: removal of the drive is detected by the AHCI driver properly and reported in dmesg however insertion of a new drive is not automatically recognized by the driver nor does it get reported in dmesg. That being said what's very interesting is that it's recognized by every other disk related tool I know of in FreeBSD including camcontrol devlist, geom/glabel/gpart (which appear to be the same executable at least in FreeNAS, don't know if they're using a busybox sort of thing that comes with mini-FreeBSD or what), and smartctl.

Issue #2: if the same fresh drive is then removed and re-inserted I receive the same error I did before. Note that this error repeats if the process is repeated. At this point in time the disk commands get into a funky state and basically break.

Code:
cam_periph_alloc: attempt to re-allocate valid device ada0 rejected flags 0x118 refcount 2
adaasync: Unable to attach to new device due to status 0x6


Issue #3: is a repeat of Q2 in my OP: basically any drive inserted after the system is booted (ie hot plugged) is recognized at the full 600MB/sec transfer rate and even stays recognized as such upon successive warm and cold reboots. If the ATX power supply is completely disconnected for a total-cold-start the drive gets recognized as 300MB/sec again upon boot just like the rest of the drives.

Question #1: Assuming a drive is zpool offline'd and then subsequently removed from the bay and then a new fresh drive inserted into the bay what is the process to have the drive added to the zpool from the command line? If I simply try a "zpool online ada0" it's not recognized. If, in the GUI, I follow the steps exactly from the manual it didn't show up in the list of drives to replace. Is this because the AHCI driver didn't recognize the new drive or is it something else? Can a FreeNAS developer weigh in or can someone refer me to the exact piece of code that the FreeNAS GUI runs to check which drives are available for replacing? When you click on storage, then click on your pool and select volume status you can see the GUI is running some code in the background that appears to be doing several things because it takes 10-15s to display anything. I'd like to know exactly what it's doing.

Question #2 part 1: is directly related to Q1a. After a reboot the drive still didn't show up in the list of disks to be replaced in the FreeNAS GUI. It showed up in the regular dmesg output as being detected upon boot (again, strangely at 600MB/sec vs. the rest of the drives which are detected at 300MB/sec yet are identical), it showed up in camcontrol devlist, in smartctl, and even in geom disk list. It *didn't* show up in glabel list until I did a manual glabel of the disk and then zpool let me do a zpool online ada0 and of course zpool replace. Again, what's going on here?

Question #2 part 2: Before anyone jumps on my case for doing it via the CLI I think if you read the thread you can see the GUI was giving me issues [most likely due to HW and/or SW not working as it should] and I was left with no choice but to get "closer to the metal" by digging around in the CLI.) So yes, now I have a disk in my pool not label'd by its GPTID. Please let me know if this will be an issue long-term and I can remove and wipe it and start over from the GUI with a total-cold-start. Or if there's a zpool relabel command that'd be easier, either way I don't mind the 6hr resilver.

Question #3 part 1: But here's what's odd in the current state with the new disk that was hot plugged (and the system still hasn't had a total-cold-start). In addition to camcontrol devlist and smartctrl which show the new disk after a reboot (and by now the 6hr resilver has completed and everything is fine otherwise):
  1. geom disk list shows the disk
  2. glabel list shoes the disk
  3. glabel status does not show the disk (but shows the other disks just fine)
  4. gpart list shows the disk
  5. gpart status shows the disks
  6. gpart show does not show the disk (but shows the other disks just fine)
What the heck's going on here?

Question #3 part 2: An interesting note: even though I did the glabel/zpool commands from the CLI the fresh new disk was still partitioned exactly as the others with the FreeNAS standard 2GB swap partition as the first 2GB then the rest is a ZFS partition. So it seems FreeNAS code ran at some point here? Does anyone have any ideas?

I think more investigation needs to be done by some combination of: FreeNAS devs, the author(s) of the AHCI driver for this Southbridge chip, and MSI. I'd like to ultimately resolve all of these issues so that I have a 100% working hotplug system with no anomalies. Any help pointing me in the right direction would be greatly appreciated (especially how to get in touch with the FreeBSD maintainer responsible for the Southbridge chip AHCI driver). Thank you.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Yeah. After I spent 45 minutes writing that I was like "zomg.. that's a wall of text.. hopefully he doesn't fall asleep and actually reads it all".

Before I answer your questions, keep this in mind... hotswap is not the same as hotplug. It sounds like your hardware may supports hotplug, but not hotswap. Keep reading though.. because it may not support anything.

I realize you probably don't know me very well, but I don't go castrating someone/something/somecompany unless I've got ample evidence. On the flip-side, I don't support something because I have no reason not to support it. I support something because I have ample evidence that it actually *does work* and is superior to other options. In short, I want solid irrefutable evidence(or lots and lots of less solid evidence) before I make an opinion for/against something.

Now to your questions:

1. Do not use the CLI to add a drive to a pool. Do it through the WebGUI just like you are supposed to. There are ways to do it from the CLI, but it changes from version to version so the only reliable way is to use the WebGUI. If this isn't satisfactory, you are welcome to use FreeBSD. After all, people choose FreeNAS over FreeBSD for the GUI. If the GUI isn't being used, you are taking major risks that FreeNAS' config database will be out of sync with your true config. The two rules for using the CLI:
  • If the WebGUI does what you want(aka it's in the manual) you'd be wise to use the manual. Doing anything else is risky at best.
  • If you *have* to do things from the CLI, be very very cautious what you do. Looking around at directories, smart data, etc is typically okay. But changing things... not okay at all.
If camcontrol doesn't show that it's being given a /dev/adax or /dev/dax it will not be available for you to use. That sounds like something isn't right with the relationship between your driver. My guess is the hotswap versus hotplug I mentioned above.

I'm not sure what code it's doing, but if you really want to know you should show up in IRC between 8pm and midnight in the USA and you'll probably find a developer on.

2-1. Not sure, but you scare me with the fact that you were doing zpool commands from the CLI. That's a recipe to start a chain reaction that will fubar FreeNAS' config and you may never be able to get FreeNAS to work properly again. As for the 300MB/sec versus 600MB/sec, I don't know. Seems a little odd to me. Overall though, unless you have SSDs that 300MB/sec won't be a bottleneck for your disk.

2-2. If your CLI stuff has somehow bastardized FreeNAS' expectations for your config, you may be past the point of fixing it. A resilver isn't the answer. Defaulting FreeNAS and redoing all of the settings by hand with a new config is likely the only solution. We see this alot because people jump to the CLI. I realize you are saying you had to go to the CLI to get it "working". But generally, if FreeNAS decides not to let you do something it probably knows better or it has a bug. To be honest, I'm not aware of any bug where FreeNAS doesn't let you do something that it *should* be allowing you to do. So I have to think that your use of the CLI to accomplish tasks is ill-conceived.

3-1. My first thought.. something in your hardware isn't properly letting the device make the rounds and become available to all aspects of your system. My second thought is you did something in the CLI before at some point in the past and FreeNAS' config is out of sync with reality. Now FreeNAS is tripping up and can't recover because your actual config and the config FreeNAS thinks you have don't match.

3-2. There's a possibility that your hardware doesn't support hotswap and/or hotplug with FreeBSD, and what you are doing just isn't supported so when a disk disappears and comes back the cached information is being reused because the system is unaware of the disk changing. This also tends to make me think that you should never try to do hotswap/hotplugging of disks on that hardware. Personally, as a rule, I never do hotswap/hotplug. There's too many people with too many problems. I've tested it with my hardware and it works. But all it takes is a driver update to break it, and when I pull a drive under load things go bonkers. It's just safer to always pull a disk with the system off.... and I'm *all* about doing safer than faster.
 

Starpulkka

Contributor
Joined
Apr 9, 2013
Messages
179
Well cyberjock is Intel fan and he's a prisoner of intel fanatism.
So you might have think few times before you take information as only option.. ^_^

It happens intel chips also look Supermicro X9SRL-f as an example has same shit going on what you currently have. (Even happened to me as on genuine intel board, intel showed me its true nature by losing expensive data, thats why i moved on freenas after that). So dont think like me that you are automagically safe when buying intel.

https://forums.freebsd.org/viewtopic.php?&t=43238

As for your broblem, i tryed xxx hours on intel and i did not get it working so just forget hot swap. If its not workin it is not working. Have you heard that bsd/unix can be pretty touchy with hardware it accepts.

So thats end of that intel vs amd.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Oh, buying intel doesn't make you safe, and I'd never try to make that argument. My whole argument between Intel and AMD surround compatibility. Read http://forums.freenas.org/index.php...-support-on-msi-e350ia-e44.20445/#post-116945 to see my reasoning behind it. I'm not a fanatic for Intel.

If you show up and ask if an Intel brand motherboard is a good buy, I'll tell you no. Plenty of people have had weird bizarre issues with Intel brand boards. So before you go labeling me Starpulkka, you might want to understand where I'm coming from. I'm using actual objective evidence to support my opinion and not some fanboy attitude that Intel is some company that can do no wrong and is the "bestest" at everything in the whole wide world. They aren't. I've personally had problems with Intel boards going back to 2004. Here it is 2014 and plenty of people here have had completely unexplainable errors that when googled are strictly found on Intel brand boards.
 

jnitis

Dabbler
Joined
Aug 12, 2012
Messages
12
So, an update to the previously discussed:

Q1: Again, my personal view, but in my research I've seen you jump to the "don't use the CLI, noob" line of thought and accusation far too often. I tried to head it off in the wording of my question (more so in Q2P2) but you still "went there." I really don't want to go off-track on the whys and wherefores but my personal ask to you (I realize this is the second one I'm making) is to try to stick to the facts at hand (ie, what was the failure state of the GUI, what commands did you run and what was the output (a session log would be ideal, which I in fact kept in this case), and etc. An instant blacklist of the poster's question at the mere mention of the CLI or "I'm washing my hands of your question the second you mention you touched the CLI" line of thought is far too presumptuous and unhelpful. In this case a pointer to the block of code run (more specifically the FreeBSD commands run) when someone clicks the menus as I described above would be helpful so that I can personally further debug.

A further thought is that based on your fear of the CLI it sounds like FreeNAS is very inflexible in this sense. It absolutely shouldn't be.

Q2P1: Remains an issue with the HW or the FreeBSD driver as far as I can tell. MSI is working with me on the HW side, I will need to look into getting someone from the FreeBSD driver side to look into it.

Q2P2: In any case the pool works fine with the disk added via the CLI. The only thing missing was the GELI encryption mentioned here which I don't use anyway. Nonetheless for consistency's sake via the GUI I offlined the drive then replaced it (with itself) and that seems to have resolved the (apparently inconsequential) inconsistency there.

Q3P1: Same as Q2P1.

Q3P2: Unkown, and would have loved to have an in-depth answer on this, but is negated by the action taken in Q2P2 above.

New development:

FreeNAS box dropped offline this afternoon. The share was gone, upon ssh login it would hang after password entry, and the console was hung as well. Usually this means something is waiting on I/O somewhere (most likely the disk subsystem in this case). I could see on the console that all 4 drives had detached themselves at seemingly the same time which was what obviously led to the system being in its current state. I rebooted (warm) and the system came back up fine but the GUI wasn't accessible despite the system being fully available via ssh. After a cold boot (recall earlier this was somehow key) the system started up fine.

I'm convinced this is related to the hotplug/swap issue underlining this thread as the system hadn't been cold booted since the last hotplug/swap. An interesting note is that one of the issues was "resolved" by this particular cold start: all drives are now recognized at 600MB/sec, and this is not the first cold start since updating the firmware as requested by MSI. Totally bizarre. There's no telling if this will stick long term but fingers crossed.

In summary I agree: hotplug/hotswap while supposedly supported by the motherboard/chipset (as listed in topic title), drive (Seagate NAS series), and software (FreeBSD 9.2 STABLE), appears to not be well tested and it's best to simply shut down the system and replace the drive. The three parties mentioned herein could save their customers/users copious amounts of time by either removing the relevant portion of documentation claiming support for this feature or adding a strong disclaimer in said documentation.

All that being said I personally like to see issues resolved through root cause analysis so I will persist. I feel like I'm sounding like a broken record but again I'll state the next step to resolution of the hotplug/swap issue still remains the same: to interconnect MSI and the FreeBSD driver developer responsible for this portion of the code. I will attempt to locate said developer now that I have a ticket open with MSI. (I was initially hoping someone here would know the dev or have a quick URL to a table of all devs for the various drivers that I could refer to.)

Hope this is helpful to someone in the future.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
1. FreeNAS is nothing more than a database, scripts, and a webpage. The database settings and scripts handle FreeNAS on bootup and ensure everything is "set" properly. Anything, and I mean ANYTHING you do in the CLI that could make the FreeNAS database not match your *actual* setup can be a major disaster. The database *must*, under no uncertain terms, always be in-sync with your actual setup. Failure to do this can lead to lost data, corrupted data, non-booting server, etc. So using the CLI for all but the most basic of tasks is just begging to lose your data. So why on God's green earth would I ever recommend to a noobie to do anything from the CLI that doesn't involve some kind of one-on-one supervision to ensure they don't break this simple fact? I'd call it downright irresponsible if I told you anything less than 'don't use the CLI'. Not to mention, if you don't want to use the WebGUI there is a perfectly capable OS that will do anything and everything you want. It's called FreeBSD. So yes, it absolutely should be inflexible.. by design.

2. Ok. You should go to the FreeBSD forums/IRC for that kind of support. The FreeNAS developers only worry about grabbing the FreeBSD drivers and adding it to FreeNAS itself. They aren't actively involved in driver writing. That error is only inconsequential until it isn't. We've had people lose pools over that "inconsequential" mistake. Which is why I'm so adamant that people don't do it from the CLI. Feel free to see #1.

I'm not sure if it's related to your hotswap/hotplug issue or not. You have no way of knowing if the hotswap/hotplug issue is a symptom or a cause. Until you can identify which it is, you can't point fingers. But you definitely have a problem, and it's more than likely hardware related. I will say that I've seen people disconnect whole pools while using FreeNAS and it didn't lock up the system. The pool was kicked offline, but you could still SSH in and use the box to the extent of expected conditions.

While I applaud your follow-through with the problem I'm not sure if I should say 'good job' or wonder why you are wasting your time on that platform. Time is worth something, and it seems to me like the far easier path is to replace the hardware rather than try to fix it. Especially considering how old and slow that platform already is. Not that I'm judging your time to be a waste or not. It's your time, if you want to spend it on this, spend it stairing at paint dry, or solving world hunger, that's totally your prerogative.

There will come a time that the CPU will not be able to perform up to your satisfaction. I'm expecting 9.2.2 to increase workload compared to what we are used to. That may put that CPU over the top.
 

jnitis

Dabbler
Joined
Aug 12, 2012
Messages
12
I totally disagree on the CLI issue. FreeNAS is a product (sold/supported commercially as well) and as such it's unacceptable to have a product be so inflexible that it can't recover from CLI commands which are related to its core functionality that aren't executed through the GUI (or vice-versa). If that is indeed the case opening a shell shouldn't be offered as an option in the GUI, you shouldn't be able to login on console, the whole thing should be locked down. Of course that's an absurd idea which is why it isn't the case as FreeNAS is shipped today.

Perhaps I'm mistaken but I can't think of any product that has such a gaping flaw shipping today (or ever, for that matter). Examples that spring to mind: HP ILO management consoles, IBM management consoles, NetApp's DataOnTap, VMWare's ESX, heck even my consumer grade WiFi router... the list goes on infinitely. (Can you imagine a support call to NetApp: "Hi, I just changed a user's quota and it's not taking effect." "Sir, did you do that from the CLI or the GUI?" "The CLI." "OK, I'm sorry, actually you're not supposed to have access to that other than to type the `date' command, as a result we're going to have to send an engineer on-site to tear down and completely reconfigure your Filer. We'll need three business days downtime.")

I don't know the intricacies of what FreeNAS keeps in a database but aside from internal GUI-related items (remembering what items you have expanded/collapsed, for example) but a database really isn't required for core system functionality: the database should be the regular files and commands already well established in the OS. You want to edit crontabs by hand in the CLI? Great, go ahead and the next time someone opens the list of crontabs in the GUI it will simply pull in the data from the regular OS crontab files. Adding a database layer on top of that serves no purpose and I'd be surprised if the FreeNAS devs actually did that. Maybe you can explain more about which precise database elements you're referring to and where they're used. There's already a level of protection involved actually, the root (boot) fs containing all of FreeNAS itself is mounted read-only by default. The only bits writable are your pools and any FreeBSD partitions that must be writable by design (/tmp, /var, etc.).

Even if a database was required there are several methods to keep the database in sync with the real-life system and this is an issue that's for the developers to solve, but two options come to my mind immediately. 1) place a simple wrapper around every single CLI command that would break the GUI (eg, disk subsystem or ZFS related) that automatically updates the database after each run of said command. 2) preferably: remove the general inflexibility of the FreeNAS code and have the GUI interact seamlessly with the OS in a fluid manner such that changes can be made in either the GUI or the CLI and FreeNAS will happily live on. This is how every other similarly operating product that comes to my mind works. As you say the core value in FreeNAS is the management GUI and as such it should be robust in its core role.

I've opened a thread on the FreeBSD forums here if anyone's interested.

The CPU I'm using might have originally shipped in 2011 but the actions being performed by it haven't changed thus I would view any lessening of performance by new software as unnecessary bloat. If anything it should be faster and more refined as long as the tasks its performing are the same. How do you explain the CPU choices by the consumer grade NAS appliances out there being in many cases less capable than an E350? As I've said before I'm very pleased the box can pull 80MB/sec just fine over CIFS (its primary duty) and saturate the wire with other protocols with more optimized support in FreeBSD (NFS, for example). Besides, NAS hardware isn't something you want to upgrade very often, it should have a very long lifecycle.
 

ser_rhaegar

Patron
Joined
Feb 2, 2014
Messages
358
They have an API if you want to do some things without the GUI. The API will keep FreeNAS informed of the changes you make but I don't believe it is comprehensive.

Also a lot of the devices you mentioned have limited CLI abilities (IBM ASM/IMM) or have some functions only available in the GUI or CLI but not both. Not every system is 100% manageable by the CLI or the GUI. Many are but not all.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
The CLI does have specific uses. When it comes time to troubleshoot, there's stuff you can do from the CLI to look and see what is going on. For 99% of users, there's no need to go to the CLI.

I'm not here to defend our practice. I'm only here to tell you how it is engineered. Because of this, I strictly recommend people not go to the CLI except when there is no alternative. If that's not acceptable for your use case, you are welcome to use another product. But to think that this product should work the way you want for a particular reason is trying to fit a square peg in a round hole. We all know how that ends...
 
Status
Not open for further replies.
Top