RE-Evaluating TrueNAS from the Historical Perspective..

Status
Not open for further replies.

NickF

Guru
Joined
Jun 12, 2014
Messages
763
Copied from my Reddit post

I should have titled this A Fanboi's Overly Emotional response to recent releases...

TLDR; I love TrueNAS but I am concerned about the future.

I've been using FreeNAS since 2014. At that time, I was nothing more but a nerd looking for a place to store his movie collection. Since, I have become an IT manager for a very large school district. I've used FreeNAS in a variety of ways both personally and professionally, but I have only ever been a consumer of this technology. I am grateful for all of the hard work and efforts that have gone into making TrueNAS a stable and reliable product, and even more grateful for the fact that it's entirely free and open source.

However, I am a bit concerned about several recent releases and trends from the IX Systems team over the past couple of years and I am worried that the developers are repeating the same mistakes of their collective pasts. Whoever is reading this should have no reason to listen to anything I have to say, but I feel motivated to say it anyway.

A few years back, iX Systems CTO Jordan Hubbard (ex-Director of UNIX Technologies at Apple, Co-Founder of FreeBSD project) dedicated a substantial amount of resources to the development of what was then called FreeNAS 10 and became known as FreeNAS Corral. During late 2016 and early 2017 several betas and RCs became available. It introduced Docker support for the first time and a host of exciting features that home users were excited about. IX Systems released this platform on March 15th of 2017. Jordan later wrote a 1-week sitrep on what his perspectives of the release were. It seems there were certainly bugs, and that there was substantial backlash on various aspects of the release. There also appears to have been a factional division inside of the company, rallying around current SVP of Engineering, Kris Moore. This division arose from the fact that there was already two development teams, one focusing on FreeNAS 9.x and the other on Corral. It seems that Kris's faction won out.

Less than a month later, an announcement was made:

The Announcements section has an important announcement about the future of FreeNAS Corral.tl;dr - It has no future, but don't worry - its major features are all going to make their way to FreeNAS 9.10
And Jordan Hubbard announced he was leaving the company.

The IX Systems Executive team, and the developers under Kris Moore's leadership went on to undo the damage done by the Corral release. Based on Jordan Hubbard's announcement, he must have known that, from his perspective, they were about to "throw the baby out with the bath water". They focused on introducing new and exciting features into the old codebase without compromising on stability, performance or security. From my perspective, I was disappointed as a casual home user, but as a technology professional I was happy to see that cooler heads seemed to have prevailed. After this, the release of FreeNAS 11 came fairly quickly, replacing both the legacy and Corral UI with what is still the basis of the current UI today. Continued development rolled several Corral features into the 11.x codebase.

Now that we are done rehashing ancient information let us focus on what has transpired since. In the more recent past, we got single-pane of glass management with TrueCommand. Work on the ZFS codebase was merged into the product. Then we got news that iXSystems was shaking things up again. It seems that while they had merged much of the development teams that were dedicated to Corral vs 9.x, when the developed 11, they had diverged again. This time, supporting two different code bases with TrueNAS Entperise on one side and FreeNAS on the other. Kris Moore made the decision to resolve that, and simplify their development merging the code bases for TrueNAS 12, in March of 2020. But that decision was either very short lived or an outright lie.

In June of 2020, Kris Announced TrueNAS SCALE. While TrueNAS 12 and TrueNAS 13 were fantastic, stable and feature rich releases, SCALE seemed to be an ambitious attempt to capture the enthusiasm that Corral once garnered. Now iXSystems was going to be a player in not just storage, but adopting the hyperconverged, highly available and highly scalable platform model.

I cannot deny that the potential for Scale is astronomical. I am currently running it in my home environment and have migrated much of my servers and services off of ESXI. But, like Corral, it feels so completely unfinished, unstable and rushed out the door. Certainly, if a re-write of the FreeNAS codebase was the goal, SCALE is what Corral should have been 5 years ago in its design and principles. For about 18 months, SCALE was under development through various code reviews and milestones. In October of 2021 IXSystem released a roadmap outlining expected releases following a schedule based on internal projections. In February of 2022, Kris announced that they were going to release the first "GA" version of SCALE, codenamed Angelfish.

I do not deny that the Angelfish release works fairly well, albeit with several quirks, updates and hotpatches that have been released since. But even with that it's release was months premature, with one of its key features "Clustered SMB" not being officially released until August 2nd 2022. Even that, it's current implementation is arguably not very useful, and encourages users with poor defaults. With us now on version SCALE 22.02.2 this is supposed to be considered "Suitable for higher uptime deployments". and has gone through several QA cycles. However, we can still see it's not up to snuff on performance and even IXSystems own roadmap doesn't expect most of the truly differentiating features to be available until codename "Bluefin" is released.

Why all of the hype? Why all of the rushed releases? I want to be able to use TrueNAS SCALE in lieu of Proxmox, XCPNG, or even ESXI for workloads that actually matter. All of the hype makes it seem like it can do that, but it's simply not ready to. SCALE Angelfish should not be considered a production grade release, and in and of itself should be considered a beta of Bluefin. I am not a developer, and perhaps that parlance is incorrect for what you are doing. As a sysadmin and a long time user and supporter of this community, I am concerned. Marketing SCALE as a stable product when it is no where near feature complete is a mistake that is damaging the credibility of the brand, just like Corral did not 5 years before it. All of the headway you've made over the past half decade into actual enterprises and real customers is meaningless if you lose their trust with poor marketing.
 

indivision

Guru
Joined
Jan 4, 2013
Messages
806
Do you have any reason to believe that a significant number of customers share your feelings about this?

Reading between the lines, it sounds like you just ran into some anecdotal quirks for the way you want to use SCALE and are translating that into a much broader issue...

My experience has been the opposite. SCALE has been just as stable for me as Core was (which I've used for years). I also saw that benchmark video about performance. But, just in practical use, I haven't noticed a performance problem/drop-off that reflects those numbers.

I'm a bit reluctant to respond to the whole Corral history just because I don't think the two situations are comparable. I liked Corral and felt like it deserved more commitment/support from iX to match what many in the community had dedicated to it. In other words, in my opinion, ditching it unexpectedly was the miscalculation.

That said, it wasn't the end of the world. And since then the product has been advanced well past where all of those old versions were. Core is FreeBSD and Scale is Linux. It's a good idea and an impressive feat that the team has been able to utilize the work put into Core on a Linux version and offer users the pros/cons of each. All in a relatively short amount of time.

I see nothing wrong with the pace or state of the releases.
 

Kris Moore

SVP of Engineering
Administrator
Moderator
iXsystems
Joined
Nov 12, 2015
Messages
1,471
Man, there is a lot here, so for the sake of brevity I will try to answer a few of your questions :)

Now that we are done rehashing ancient information let us focus on what has transpired since. In the more recent past, we got single-pane of glass management with TrueCommand. Work on the ZFS codebase was merged into the product. Then we got news that iXSystems was shaking things up again. It seems that while they had merged much of the development teams that were dedicated to Corral vs 9.x, when the developed 11, they had diverged again. This time, supporting two different code bases with TrueNAS Entperise on one side and FreeNAS on the other. Kris Moore made the decision to resolve that, and simplify their development merging the code bases for TrueNAS 12, in March of 2020. But that decision was either very short lived or an outright lie.

I've trimmed some of the ancient history part, but it is fun to discuss if you catch me another day :)

Long story short, yes we unified FreeNAS + TrueNAS into a single Code base of "TrueNAS". That decision was critical to allowing us to expand again later into SCALE. No way we could have taken that on while still maintaining two codebases. We are back to supporting two divergent (but similar) code-bases, but that should be expected when you release two TrueNAS products based on two different operating systems.



In June of 2020, Kris Announced TrueNAS SCALE. While TrueNAS 12 and TrueNAS 13 were fantastic, stable and feature rich releases, SCALE seemed to be an ambitious attempt to capture the enthusiasm that Corral once garnered. Now iXSystems was going to be a player in not just storage, but adopting the hyperconverged, highly available and highly scalable platform model.

I cannot deny that the potential for Scale is astronomical. I am currently running it in my home environment and have migrated much of my servers and services off of ESXI. But, like Corral, it feels so completely unfinished, unstable and rushed out the door. Certainly, if a re-write of the FreeNAS codebase was the goal, SCALE is what Corral should have been 5 years ago in its design and principles. For about 18 months, SCALE was under development through various code reviews and milestones. In October of 2021 IXSystem released a roadmap outlining expected releases following a schedule based on internal projections. In February of 2022, Kris announced that they were going to release the first "GA" version of SCALE, codenamed Angelfish.

SCALE is ambitious, no doubt and also where we are seeing most of the excitement these days. Nothing wrong with that.

I'd push back hard against comparing it to Corral though. I was there for both, it's a night and day difference. We have the adoption numbers and bug data to back that up. I started using it as my primary NAS back in BETA1 phase, something I never could even do with Corral even post-release :)


I do not deny that the Angelfish release works fairly well, albeit with several quirks, updates and hotpatches that have been released since. But even with that it's release was months premature, with one of its key features "Clustered SMB" not being officially released until August 2nd 2022. Even that, it's current implementation is arguably not very useful, and encourages users with poor defaults. With us now on version SCALE 22.02.2 this is supposed to be considered "Suitable for higher uptime deployments". and has gone through several QA cycles. However, we can still see it's not up to snuff on performance and even IXSystems own roadmap doesn't expect most of the truly differentiating features to be available until codename "Bluefin" is released.
I'd push back against that as well. Angelfish was always intended to be focused on initially porting the "CORE" functionality to its new Linux base, while laying the groundwork for new things like Clustering, Linux Containers, etc. Performance incrementally improved with new updates, and Bluefin is where we start putting SCALE through more of its Enterprise optimization etc. However for most typical community use-cases, you'll find SCALE performance is on par with CORE today. At the higher end, its mixed, better in some cases, worse in others which we are well aware of and are working towards resolving.

As for Clusting specifically, we've even labeled that as early / experimental in the TrueCommand UI, with much more coming as it moves out of "Experimental" phases. But focus has to stay on single node issues first, since that foundation has to be rock solid before you introduce another complex layer like clustering into the mix.

Why all of the hype? Why all of the rushed releases? I want to be able to use TrueNAS SCALE in lieu of Proxmox, XCPNG, or even ESXI for workloads that actually matter. All of the hype makes it seem like it can do that, but it's simply not ready to. SCALE Angelfish should not be considered a production grade release, and in and of itself should be considered a beta of Bluefin. I am not a developer, and perhaps that parlance is incorrect for what you are doing. As a sysadmin and a long time user and supporter of this community, I am concerned. Marketing SCALE as a stable product when it is no where near feature complete is a mistake that is damaging the credibility of the brand, just like Corral did not 5 years before it. All of the headway you've made over the past half decade into actual enterprises and real customers is meaningless if you lose their trust with poor marketing.

We have the data showing it is in use in "production" all over, but it depends on what your use-case is at this particular point in time. If you are looking for Apps (Linux Container/Docker), KVM, Linux, Improved Hardware support and what I'd call typical SMB/NFS/iSCSI usage, you are in perfectly good shape with SCALE today. Clustering is still marked as early / experimental as it should be. It's not designed to replace ESXI either, never was intended to :P

If you are seeing specific edge cases that are broken or deal breakers for your needs, by all means, please open bug tickets so we can review and address them.

Again, this is an Open Source project, and we have a mantra of "Release Early, Release Often". This is how projects evolve, find issues, get feedback, etc.

That all said, I do very much appreciate the feedback and will pass it back along that we need to be more cautious with how we market SCALE since we don't want to be giving bad impressions to our customers and community users.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
My take: There is no "Scale Problem" in the vein of Corral. It's just the usual, troubled expectations management around releases. Part of this is that -U0s are infamously buggy, but a big part is users diving in cluelessly and carelessly.
Clustering is still marked as early / experimental as it should be.
Gluster is a finicky beast, capable, yet fragile and oddly limited. I don't envy your task, but look forward to the results.
 

anodos

Sambassador
iXsystems
Joined
Mar 6, 2014
Messages
9,554
ctdbd manages public IP addresses for the SMB cluster nodes c.f. https://wiki.samba.org/index.php/Adding_public_IP_addresses#Public_addresses_file

Sample public IPs for SMB cluster (in our CI pipeline):
Code:
root@nodea[~]# midclt call ctdb.public.ips.query | jq
[
  {
    "id": 0,
    "pnn": 0,
    "configured_ips": {
      "10.238.238.216": {
        "enabled": true,
        "public_ip": "10.238.238.216",
        "interface_name": "enp1s0"
      },
      "10.238.238.217": {
        "enabled": true,
        "public_ip": "10.238.238.217",
        "interface_name": "enp1s0"
      },
      "10.238.238.218": {
        "enabled": true,
        "public_ip": "10.238.238.218",
        "interface_name": "enp1s0"
      }
    },
    "active_ips": {
      "10.238.238.218": [
        {
          "name": "enp1s0",
          "active": true,
          "available": true
        }
      ]
    }
  },
  {
    "id": 1,
    "pnn": 1,
    "configured_ips": {
      "10.238.238.216": {
        "enabled": true,
        "public_ip": "10.238.238.216",
        "interface_name": "enp1s0"
      },
      "10.238.238.217": {
        "enabled": true,
        "public_ip": "10.238.238.217",
        "interface_name": "enp1s0"
      },
      "10.238.238.218": {
        "enabled": true,
        "public_ip": "10.238.238.218",
        "interface_name": "enp1s0"
      }
    },
    "active_ips": {
      "10.238.238.216": [
        {
          "name": "enp1s0",
          "active": true,
          "available": true
        }
      ]
    }
  },
  {
    "id": 2,
    "pnn": 2,
    "configured_ips": {
      "10.238.238.216": {
        "enabled": true,
        "public_ip": "10.238.238.216",
        "interface_name": "enp1s0"
      },
      "10.238.238.217": {
        "enabled": true,
        "public_ip": "10.238.238.217",
        "interface_name": "enp1s0"
      },
      "10.238.238.218": {
        "enabled": true,
        "public_ip": "10.238.238.218",
        "interface_name": "enp1s0"
      }
    },
    "active_ips": {
      "10.238.238.217": [
        {
          "name": "enp1s0",
          "active": true,
          "available": true
        }
      ]
    }
  }
]
 
Last edited:

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
I'd push back hard against comparing it to Corral though.
It does seem in many ways that SCALE is--or is intended to be--what FN10 (I'm sorry, "Corral" was always a stupid name) was trying to be. But you're still publishing CORE releases with FN10-class showstopping bugs (i.e., disk replacement didn't work in 13.0; you didn't deem that worthy of pulling the release, and it took eight weeks to ship a fix). I share Nick's concerns, and they aren't limited to SCALE. And when your own guidance, three releases into the 13 train, is that it still isn't ready for "general use" (let alone "conservative users"), that does nothing to quiet those concerns.
 

NickF

Guru
Joined
Jun 12, 2014
Messages
763
Hi Kris,
First and foremost I wanted to say that I appreciate the time you took to write your response. I would also like to say that, I hope you didn't take any of my thoughts or criticisms as a personal attack on you as they were not intentioned in that way. I re-read what I wrote in your reply and some of my post sounded harsher than I had intended.

I've trimmed some of the ancient history part, but it is fun to discuss if you catch me another day :)

I'm sure your perspective as the boots on the ground during what happened would be fascinating lol.

Long story short, yes we unified FreeNAS + TrueNAS into a single Code base of "TrueNAS". That decision was critical to allowing us to expand again later into SCALE. No way we could have taken that on while still maintaining two codebases. We are back to supporting two divergent (but similar) code-bases, but that should be expected when you release two TrueNAS products based on two different operating systems.

I don't doubt that from a technical standpoint you did what you said you did. Just from my perspective as an outsider looking in, you had released two press releases in a very short period of time that seemingly contradicted each other. One you were simplifying your workflow and focusing on making it easier to move forward, and another where you were taking on a new massive project which would seemingly take up an exponential increase in development hours.

If you (and I am using the term "you" more as "IXSystems" than you personally) had known a press release for SCALE was immanent at the time of the unification post, it would certainly have been less confusing from a consumer standpoint to merge the two when you were ready. I think it was just the timing that put me off, but that's just my opinion.

SCALE is ambitious, no doubt and also where we are seeing most of the excitement these days. Nothing wrong with that.

I'd push back hard against comparing it to Corral though. I was there for both, it's a night and day difference. We have the adoption numbers and bug data to back that up. I started using it as my primary NAS back in BETA1 phase, something I never could even do with Corral even post-release :)

I certainly am not trying to compare the build quality and stability of SCALE to Corral- I am merely trying to draw a correlation between those events and the events of SCALE from the outside. Jordan was on these forums all the time trumping up how awesome 10 was going to be. I mean, he started talking about it when he came on board in 2014, and that's when I really started engaging in this community (RIP cyberjock). Then we got 9.10 which was a clever marketing spin, and it satiated us all for a while. But, ultimately we got what we got. It was 3 years later, on version 2.0 of the original plan, and it sucked. It seemed like we were waiting for Duke Nukem...

Conversely, with SCALE we got an announcement that we were getting the most ambitious product yet. A truly next generation platform, still with the same grass-roots open source secret sauce. This time we got a full blown release roadmap with carefully plotted out time tables and milestones. It was frankly quite exciting. Meanwhile, before it was even out of BETA IX released the R-Series platforms as something budget friendly and specifically designed for SCALE. Then when "Angelfish" was released, all mentions of "BETA" in the name are gone, but the promised features that make the platform so exciting aren't available until "Bluefin".

In both situations, expectations were set very high and excitement leading up to the respective launches have gained alot of media attention in certain very nerdy circles. Storage Review, ServeTheHome, TekSyncicate/Level1Techs, Lawrence Systems to name a few. That is the only comparison I have been trying to make here, as in both cases, expectations seem to have been set higher than they should have been.
I'd push back against that as well. Angelfish was always intended to be focused on initially porting the "CORE" functionality to its new Linux base, while laying the groundwork for new things like Clustering, Linux Containers, etc. Performance incrementally improved with new updates, and Bluefin is where we start putting SCALE through more of its Enterprise optimization etc. However for most typical community use-cases, you'll find SCALE performance is on par with CORE today. At the higher end, its mixed, better in some cases, worse in others which we are well aware of and are working towards resolving.

As for Clusting specifically, we've even labeled that as early / experimental in the TrueCommand UI, with much more coming as it moves out of "Experimental" phases. But focus has to stay on single node issues first, since that foundation has to be rock solid before you introduce another complex layer like clustering into the mix.

If that is and was the plan, why is Angelfish itself not considered an early access, or technical preview or a BETA? While it is feature complete in and of the milestones you set, it is certainly not feature complete in and of the intention of what you've marketed the project as being targeted as. I think Patrick over at STH summed this up pretty well in his article where he compares SCALE to Proxmox:

If you don't want people to be comparing SCALE to what else is out there, you shouldn't call it a "release"...

We have the data showing it is in use in "production" all over, but it depends on what your use-case is at this particular point in time. If you are looking for Apps (Linux Container/Docker), KVM, Linux, Improved Hardware support and what I'd call typical SMB/NFS/iSCSI usage, you are in perfectly good shape with SCALE today. Clustering is still marked as early / experimental as it should be. It's not designed to replace ESXI either, never was intended to :P

NFS and iSCSI performance have really been the bread-and-butter of FreeNAS since the beginning. The fact that it could do both block and file storage at the same time is what differentiated it from name-your-nas-provider-here for so long. To release Scale with it's CORE (pun intended ;P) functionality in a well-known degraded state and to define it as "stable" is a bit disingenuous. I've not seen an official response to the Phoronix testing done by Tom Lawrence, but if there is one I'd be interested to see it.

It's not designed to replace ESXI either, never was intended to :P

When you use the words

Scale-out
Converged
Active-active
Linux containers
Easy-to-manage

It hard to not think of a VSAN with Photon, or Proxmox with an easy button. Your platform screams HCI which is a drum VMWare has been beating for years now. Just sayin...

If you are seeing specific edge cases that are broken or deal breakers for your needs, by all means, please open bug tickets so we can review and address them.

Again, this is an Open Source project, and we have a mantra of "Release Early, Release Often". This is how projects evolve, find issues, get feedback, etc.

I have run nightlys on machines, been here and in the jira and redmine complaining about problems. I get it. But lets call a spade a spade. Angelfish is a technical preview. That's ALL I saying.

That all said, I do very much appreciate the feedback and will pass it back along that we need to be more cautious with how we market SCALE since we don't want to be giving bad impressions to our customers and community users.
I appreciate your response, truly.

Gluster is a finicky beast, capable, yet fragile and oddly limited. I don't envy your task, but look forward to the results.
I don't envy his task either, plus he has idiots like me complaining :P
 

Kris Moore

SVP of Engineering
Administrator
Moderator
iXsystems
Joined
Nov 12, 2015
Messages
1,471
It does seem in many ways that SCALE is--or is intended to be--what FN10 (I'm sorry, "Corral" was always a stupid name) was trying to be. But you're still publishing CORE releases with FN10-class showstopping bugs (i.e., disk replacement didn't work in 13.0; you didn't deem that worthy of pulling the release, and it took eight weeks to ship a fix). I share Nick's concerns, and they aren't limited to SCALE. And when your own guidance, three releases into the 13 train, is that it still isn't ready for "general use" (let alone "conservative users"), that does nothing to quiet those concerns.

Fair enough. That was a annoying miss for 13.0 from our part, but one which we issued a work-around to almost immediately using the CLI:


If we couldn't do that work-around, we would have done a hot-fix much more quickly. But I'll take that feedback and we're going to work harder to ensure no big misses like that again.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
we're going to work harder to ensure no big misses like that again.
You've been saying that since you 86'd FN10. And to be fair, the situation with 13.0 wasn't what it was with FN10--in the latter case, there was a deliberate decision to ship a product without working disk replacement (which is about as central to storage functionality as it gets) in the GUI; in 13.0 I presume it was inadvertent. But some of us have been around long enough to remember that fiasco, and I don't think I'm the only one who doesn't feel that the community ever got an accounting of what series of decisions culminated in "ship that dumpster fire," nor (more importantly) "what we're going to do to ensure something like that never happens again." I understand reluctance to throw people under the bus, which would pretty much be a necessary part of the first, and probably would be necessary in the second--but without that, all we have is "trust us, we'll do better." And every time something like this happens (and this is hardly the first time since FN10), there seems to be less reason to trust that you'll do better.

I feel like I've been griping about iX a lot lately, and I don't really want to be doing that, but this is a problem. It's a big problem, you (plural) don't seem to recognize it as a problem, and it doesn't seem to be getting better. Instead, the response out of iX looks a lot like:
d.100919026.1161794.s3.1-242424-YXJ0aXN0c2hvdA-800x800.jpg
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
The OpenZFS team, (probably going back a little to Sun Microsystems), have an automated set of tools for ZFS. So if you think you have fixed a bug, and can categorize it, then you run the new code through the automated tests to see if you introduced new bugs. New features are basically not accepted unless they have both test suites for it / them, and the documentation to show the new feature's usage.

Perhaps more automated testing for TrueNAS should be considered?

This would hopefully catch things like the disk replacement problem. Plus, if an automated test fails because of a change in parameters, this is also a clue that the documentation would have to match the updated behavior. And yes, I know their is a world of difference between automated command line level testing and Web GUI testing.


One issue about the Release That Shall Not Be Named, is that it used a GUI toolkit that was being deprecated upstream. This was somewhat a mistake by iXSystems to start a new release and not be sure of it's upstream code longevity. So if the Un-namable release had continued, it would have had to be re-written anyway.

P.S. I like that we have an Un-namable release. A nice inside joke...
 

Kris Moore

SVP of Engineering
Administrator
Moderator
iXsystems
Joined
Nov 12, 2015
Messages
1,471
Hi Kris,
First and foremost I wanted to say that I appreciate the time you took to write your response. I would also like to say that, I hope you didn't take any of my thoughts or criticisms as a personal attack on you as they were not intentioned in that way. I re-read what I wrote in your reply and some of my post sounded harsher than I had intended.



I'm sure your perspective as the boots on the ground during what happened would be fascinating lol.



I don't doubt that from a technical standpoint you did what you said you did. Just from my perspective as an outsider looking in, you had released two press releases in a very short period of time that seemingly contradicted each other. One you were simplifying your workflow and focusing on making it easier to move forward, and another where you were taking on a new massive project which would seemingly take up an exponential increase in development hours.

If you (and I am using the term "you" more as "IXSystems" than you personally) had known a press release for SCALE was immanent at the time of the unification post, it would certainly have been less confusing from a consumer standpoint to merge the two when you were ready. I think it was just the timing that put me off, but that's just my opinion.

Sure, at the time we had started dreaming of SCALE, but hadn't officially made that call yet. Having the FreeNAS + TrueNAS unification go successfully was a critical step to ensuring we could indeed take on another project like SCALE.


NFS and iSCSI performance have really been the bread-and-butter of FreeNAS since the beginning. The fact that it could do both block and file storage at the same time is what differentiated it from name-your-nas-provider-here for so long. To release Scale with it's CORE (pun intended ;P) functionality in a well-known degraded state and to define it as "stable" is a bit disingenuous. I've not seen an official response to the Phoronix testing done by Tom Lawrence, but if there is one I'd be interested to see it.


Disagree with "Degraded" state. Can performance be better? Sure, but comparing an older product with 10+ years of performance tuning, tweaking and out of box optimizations to a product < 1 year old is a bit disingenuous. We fully expect those bottlenecks to get knocked out in the coming releases and in the end for SCALE to even superspeed CORE in performance, probably across the board.


I have run nightlys on machines, been here and in the jira and redmine complaining about problems. I get it. But lets call a spade a spade. Angelfish is a technical preview. That's ALL I saying.

Gotta disagree here. If you are hanging your hat on the cluster feature only, then sure, it's a technical preview. (Which we even indicate in the UI). However, clustering isn't for everybody, I'd say 98-99% of SCALE users will never run a cluster. (Clusters are a huge commitment more suited for enterprise types). In that more normal NAS use-case, SCALE is rock solid. My home/work TrueNAS boxes all run SCALE today, its a better experience than CORE for SMB and running Apps (Containers) hands down. FWIW, my SMB performance was identical on both, but again, really is subjective there based on your needs.


I appreciate your response, truly.


I don't envy his task either, plus he has idiots like me complaining :P

No worries, all part of the fun of software development with an active, engaged open-source community, for which we are very thankful :)
 

Kris Moore

SVP of Engineering
Administrator
Moderator
iXsystems
Joined
Nov 12, 2015
Messages
1,471
The OpenZFS team, (probably going back a little to Sun Microsystems), have an automated set of tools for ZFS. So if you think you have fixed a bug, and can categorize it, then you run the new code through the automated tests to see if you introduced new bugs. New features are basically not accepted unless they have both test suites for it / them, and the documentation to show the new feature's usage.

Perhaps more automated testing for TrueNAS should be considered?

This would hopefully catch things like the disk replacement problem. Plus, if an automated test fails because of a change in parameters, this is also a clue that the documentation would have to match the updated behavior. And yes, I know their is a world of difference between automated command line level testing and Web GUI testing.


One issue about the Release That Shall Not Be Named, is that it used a GUI toolkit that was being deprecated upstream. This was somewhat a mistake by iXSystems to start a new release and not be sure of it's upstream code longevity. So if the Un-namable release had continued, it would have had to be re-written anyway.

P.S. I like that we have an Un-namable release. A nice inside joke...

Lol, have I mentioned we are hiring Automation engineers at this moment?



But to answer your question, we do have some automation we run against the builds. Part of our next big push on that front will be to bring more of that to GitHub in the form of Actions, so its easier for everybody to see how testing is done, look for issues, etc. So I take that feedback for sure, we know we can do a lot better on that front, and are already shifting to make some major improvements there.
 

Kris Moore

SVP of Engineering
Administrator
Moderator
iXsystems
Joined
Nov 12, 2015
Messages
1,471
You've been saying that since you 86'd FN10. And to be fair, the situation with 13.0 wasn't what it was with FN10--in the latter case, there was a deliberate decision to ship a product without working disk replacement (which is about as central to storage functionality as it gets) in the GUI; in 13.0 I presume it was inadvertent. But some of us have been around long enough to remember that fiasco, and I don't think I'm the only one who doesn't feel that the community ever got an accounting of what series of decisions culminated in "ship that dumpster fire," nor (more importantly) "what we're going to do to ensure something like that never happens again." I understand reluctance to throw people under the bus, which would pretty much be a necessary part of the first, and probably would be necessary in the second--but without that, all we have is "trust us, we'll do better." And every time something like this happens (and this is hardly the first time since FN10), there seems to be less reason to trust that you'll do better.

I feel like I've been griping about iX a lot lately, and I don't really want to be doing that, but this is a problem. It's a big problem, you (plural) don't seem to recognize it as a problem, and it doesn't seem to be getting better. Instead, the response out of iX looks a lot like:

Admittedly, there are those occasional high-profile issues that show up and bite us in the rear. As mentioned in previous reply, we're currently investing a lot more into testing / automation to try and up our game further with continuous improvement. Kaizen!
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
@Kris Moore - Great, I hope it goes well for you and iXsystems.
 

NickF

Guru
Joined
Jun 12, 2014
Messages
763
Disagree with "Degraded" state. Can performance be better? Sure, but comparing an older product with 10+ years of performance tuning, tweaking and out of box optimizations to a product < 1 year old is a bit disingenuous. We fully expect those bottlenecks to get knocked out in the coming releases and in the end for SCALE to even superspeed CORE in performance, probably across the board.
When the two products are compared in that light, I understand your perspective. I also have no doubt that you'll squeeze more performance out of Linux in the long run. But let's look at it from a different angle.

When I was a kid, I used to love Super Soakers. Every year my grandfather would buy me whatever new version came out that was bigger and better than ever. At the end of the day, it was just a water gun, but you would pump it up and it would shoot for feet and feet. Then they came out with the backpack ones and you could literally run around like 2 gallons of water on your back and never run out of water. TrueNAS Core is that.
1659576699556.png


Somewhere along the way, Hasbro changed how they made the toys and they got smaller. Now they look like this:
1659576789627.png


For me, TrueNAS SCALE is that new design of super soaker, except you guys put a laser sight , a sling, a stock and some other fancy doodads on it. You make all your friends jealous with your gizmos and flare. But at the end of the day, in it's current form, it's still not as good as the original AT ITS ORIGNAL TASK. It may be cooler and fancier, heck do you remember the water ball thing?

EDIT: They still make it.

I guess SCALE is that. :P But I bet my old Super Soaker Backpack still gets you wetter than that thing does.

Gotta disagree here. If you are hanging your hat on the cluster feature only, then sure, it's a technical preview. (Which we even indicate in the UI). However, clustering isn't for everybody, I'd say 98-99% of SCALE users will never run a cluster. (Clusters are a huge commitment more suited for enterprise types). In that more normal NAS use-case, SCALE is rock solid. My home/work TrueNAS boxes all run SCALE today, its a better experience than CORE for SMB and running Apps (Containers) hands down. FWIW, my SMB performance was identical on both, but again, really is subjective there based on your needs.
I'm not hanging my hat on that one feature. I have to disagree with you as well. The very name of the product is SCALE and the very first tenant of that name is SCALE OUT. Saying the product is stable as a single node, and defending your ground on that point is disingenuous. Again, it's not a criticism of the work you have done thus far, but merely how it is being presented. You cannot market a product as a SCALE OUT design and then release it as STABLE when it only works with one node...

As mentioned in previous reply, we're currently investing a lot more into testing / automation to try and up our game further with continuous improvement. Kaizen!

I appreciate the Toyota mantra :P
 
Last edited:

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
I'm not hanging my hat on that one feature. I have to disagree with you as well. The very name of the product is SCALE and the very first tenant of that name is SCALE OUT. Saying the product is stable as a single node, and defending your ground on that point is disingenuous. Again, it's not a criticism of the work you have done thus far, but merely how it is being presented. You cannot market a product as a SCALE OUT design and then release it as STABLE when it only works with one node...

Hi Nick,

I'm the product management guy that OKed this. You can fire the arrows my way.

TrueNAS SCALE is an Open Source project not a one-off product. We expect it to evolve over many years. It doesn't have all the features we want in 2022, but it has enough to get started. The success of the project will be assessed each year by numbers of satisfied users and number of systems deployed. We started from zero and are now over 20,000 systems and growing reliably. Success will be that it is still growing in a couple of years time and that its the best Open Source project that is doing this. We hope it keeps going for 10+ years.

We decided to release this year because it had started to SCALE. At launch in February is did support scale-out in the form of S3 (minio) and Gluster native. We knew that clustered SMB would be available (and wanted) but it needed more testing and polish... we didn't claim clustered SMB at release.

We also identified that there are a large number of users that wanted a single node deployment with both Kubernetes/Docker and a more reliable virtualization environment (KVM). By supporting this single-node model well, we have a much larger user and testing base which delivers quality more quickly. While we all want more automated testing, the reality is that this community gets free software and provides a final and very much appreciated QA role.

Would we like to roll-out software and features faster... yes. However, TrueNAS is a free software project with a finite engineering team. We cannot grow that team without commercial success.

We also decided that we would release TrueNAS 13.0 this year to enable an easy evolution for the existing TrueNAS 12.0 base. THis release schedule was forced by some security vulnerabilities and the need to have updated jails. So there are two major projects this year - not ideal, but you don't get to choose these situations. The Engineering team has done a fantastic job of juggling these two projects and maintaining these.

So, I'd request that you review TrueNAS SCALE again at the end of 2022. How much progress did we make and how satisfied is the community? We know there will be some use-case that we can't support, but hopefully there are enough for the project to be growing and thriving.

Feel free to message me privately or publicly.

Cheers

Morgan
 

NickF

Guru
Joined
Jun 12, 2014
Messages
763
Hi Morgan,

Sorry for my insomnia riddled response. As always, I appreciate your candor and all of the help you provide here on these forums.

Onto the discussion.
TrueNAS SCALE is an Open Source project not a one-off product. We expect it to evolve over many years. It doesn't have all the features we want in 2022, but it has enough to get started. The success of the project will be assessed each year by numbers of satisfied users and number of systems deployed. We started from zero and are now over 20,000 systems and growing reliably. Success will be that it is still growing in a couple of years time and that its the best Open Source project that is doing this. We hope it keeps going for 10+ years.
I understand that there has to be a version 1.0. That goes for any software product, whether it be open source or not. I get that benchmarks are needed so that you know where to drive development. You need to size your teams appropriately based on those numbers and any corresponding projected income from business users that may be attributed to the ongoing success of the free version.

We decided to release this year because it had started to SCALE. At launch in February is did support scale-out in the form of S3 (minio) and Gluster native. We knew that clustered SMB would be available (and wanted) but it needed more testing and polish... we didn't claim clustered SMB at release.

We also identified that there are a large number of users that wanted a single node deployment with both Kubernetes/Docker and a more reliable virtualization environment (KVM). By supporting this single-node model well, we have a much larger user and testing base which delivers quality more quickly. While we all want more automated testing, the reality is that this community gets free software and provides a final and very much appreciated QA role.
You identified that the target audience of the release should be nerds like us here on the forums or on Reddit. Folks who are intelligent enough to help test and report bugs and folks who just want to run a single node with some Docker containers and have some storage that's safe and reliable.

What percentage of that target audience, which you define as over 20,000 systems, were expected to use MinIO and what percentage is actively using it? Without clustered Kubernetes support, what is the use case? Same question for Gluster native, without having SMB clustering, what's the point? How many users actually had a use case that couldn't have just been resolved with ZFS replication?

With the release of TrueCommand yesterday, we got SMB clustering, several months after launch. But even that, in it's current form has serious limitations. If I have a dataset I want clustered, but I have another dataset I don't have enough space to cluster on my other systems, I can't leverage the feature. Having to choose between cluster and breaking all of my other existing shares is a really tough pill to swallow. What's worse is that in the video it a warning that flies in from no where and is quickly glossed over.

Same goes with iSCSI, NFS, etc. Why bother having clustered storage if you cannot multipath your I/O between more than one system? You mention KVM, even if external systems can't access the clustered pools, surely we should have been able to create highly available VMs. This functionality has been in Proxmox for years, shouldn't that have been in the release? In the absence of all of these things, what good is a scale-out system if none of the underlying technologies in the platform can utilize it.

It’s like you guys built a really cool car. But you forgot to put the passenger seat in, the backseats are missing, and you don’t have any carpet. You have a great sound system, nice wheels and tires, and even a fairly strong engine. But you can’t enjoy it with anyone else, and you can’t bring your kids to school. You don’t just need to tune the engine and adjust the feel of the suspension. You need to finish the car.

As a follow up coming from a different direction, I'm confused about the goal here. By supporting the single-node model well, you have indeed created a larger user testing base. But what percentage of the testing base you've defined will ever SCALE OUT? Your userbase of dudes like me who have racks in their homes is a fraction of what the larger overall userbase is. The other users will come, because there is a serious desire from alot of folks to be using Docker and Kubernetes as a professional learning platform or a hobby for fun things in their house.

But the actual users you need to impress are not those users, it's the business customers who will buy actual hardware and support contracts from you. It's those users who you are hurting by releasing a half-baked product, and it's those users who pay for the salaries of all of the folks at iXSystems. I am also one of those users, I've priced out buying IX Hardware for some video surveillance projects. The problem is that big name competitors are putting more and more downward pressure on the market. When I go to my director and say I can buy a Lenovo DE system for the same money or less than an IX system, it's a hard sell. Now, with SCALE there are more differentiating features on the horizon, that sell is going to get easier. But I can't in good conscious tell him that I want to make a serious capital investment on a product that is fundamentally incomplete at this point....

Would we like to roll-out software and features faster... yes. However, TrueNAS is a free software project with a finite engineering team. We cannot grow that team without commercial success.
I'm not asking you to, really. I'm just asking you not to call incomplete product "release".

We also decided that we would release TrueNAS 13.0 this year to enable an easy evolution for the existing TrueNAS 12.0 base. THis release schedule was forced by some security vulnerabilities and the need to have updated jails. So there are two major projects this year - not ideal, but you don't get to choose these situations. The Engineering team has done a fantastic job of juggling these two projects and maintaining these.
That is totally understood, but not really my point. I understand the trouble with competing priorities. I think all of us in IT do, and I am also willing to wait for the pie to finish baking in the oven.

So, I'd request that you review TrueNAS SCALE again at the end of 2022. How much progress did we make and how satisfied is the community? We know there will be some use-case that we can't support, but hopefully there are enough for the project to be growing and thriving
I've already staged my home environment for testing this winter or next spring, whenever you guys get around to releasing Bluefin. I've migrated all of my VMs save 1 off of my production VMWare box and into SCALE. I have a third box sitting in the rack waiting for a couple more pieces of storage. I'm actually very excited to see where this bus brings me.

My final words will be that I still think it’s a marketing problem. If you guys said from the onset that you were porting TRUENAS CORE from BSD to a Debian base, with the goal being solid KVM snd Docker support, we would all be singing the praises. You could have silently worked on the SCALE OUT pieces and announced them after the initial release as being part of version 2.0. But you called it SCALE, and talked about how awesome SCALE OUT is going to be. Then you released a product that doesn’t have any useful scale out features…. The timing and content of the product announcement and the fact that Angelfish 22.02.2 is advertised as a complete end product that is ready for business use is just not okay…
Thanks again!
 
Last edited:

Jailer

Not strong, but bad
Joined
Sep 12, 2014
Messages
4,977
If you guys said from the onset that you were porting TRUENAS CORE from BSD to a Debian base, with the goal being solid KVM snd Docker support, we would all be singing the praises
Not all of us. I like my jails and I'm still researching what I'm going to do with them when TrueNAS does finally drop FreeBSD as it's base. I detest Docker.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Docker is like Dubai. Fancy veneer, but the sewage has to be trucked out because nobody thought of that problem before jumping in head-first.
Scale uses Kubernetes, as I understand it, which is probably better, but does a terrible job at getting people started without a deep dive that’s hard to justify with “why not” when there are millions of things that need to be done ASAP.
 

kiler129

Dabbler
Joined
Apr 16, 2016
Messages
22
Scale uses Kubernetes, as I understand it, which is probably better, but does a terrible job at getting people started without a deep dive that’s hard to justify with “why not” when there are millions of things that need to be done ASAP.

Imho SCALE shouldn't advertise docker support per se. Apps are neither really full-blown Kubernetus nor easy to use Docker. Coming from extensive Docker background it took me almost a whole day to set up a simple haproxy+snipe-it stack.... and it still cannot run on port host's 443 while being isolated properly.

KVM is great in its roots but with inability to pass two USB dongles (fixed in Bluefin) or blocking a GPU for no reason are just teething pains.

SCALE is great but it is half baked at everything but storage. It needs time to mature.
 
Status
Not open for further replies.
Top