Truenas Core on ESXi copying data lags file explorer out

GW2

Dabbler
Joined
Jan 27, 2023
Messages
15
Hi my hardware setup as follows

Dell T420 Server
2x E5-2450 v2 (8C 16T 2.5GHz)
192GB DDR3 1600 MHz (12 x 16GB)
Dell PERC H710P (used for VM Storage, truenas OS drive is stored here)
Virtual Disk 1: RAID5 8 x 1TB SATA SSD
Dell 6Gbps SAS HBA (External H200) flashed to IT mode (Passed through PCIE to truenas VM)
Dell MD 3000 15 bay 3.5" SAS enclosure
12 x 8TB Seagate IronWolf drives
OS: ESXi 6.7

Truenas VM Info:
CPU Cores: 8
Memory: 64GB
VHD1: 16GB (for OS, stored on SSD RAID 5)
PCI Passthrough: Dell 6Gbps SAS HBA (External H200) flashed to IT mode
Truenas Core 13 - TrueNAS-13.0-U3.1


Truenas Core comes up after installation and everything seems to work fine, get setup with a static IP, create a pool, all my drives show up, create a dataset and an smb share, after fiddling with permissions a bit go access to the share, and from this point on I have nothing but problems.

I cannot copy data to the NAS properly from my windows 10 system over smb, it hangs for a long time and then says "An unexpected error is keeping you rom copying the file. If you continue to receive this error, you can use the error code to search for help with this problem. Error 0x8007003B: An unexpected network error occured". But then to make it even weirder, if I refresh my truenas share the file I copied is there!

hw.pci.enable_msi/msix tunables but I don't really understand what I am doing with them, what does the number correspond to? is it just boolean off/on 0/1
hw.pci.enable_msi="0"
hw.pci.enable_msix="0"
made it not boot anymore and I had to redo the VM, but still the same behavior.
Any help at all would be greatly appreciated as I am kind of lost with this now.
 

Attachments

  • unexpected network error.JPG
    unexpected network error.JPG
    33 KB · Views: 105

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,681
You have a *lot* of CPU cores assigned. Is there a reason for this?

It could be that your ESXi host is having trouble allocating timeslots to the VM due to the number of cores requested. If that happens, ESXi will not allow the VM to run, temporarily suspending it until it finds a timeslice with the configured resources available.
 

GW2

Dabbler
Joined
Jan 27, 2023
Messages
15
I thought more it better than less? I will try it with 4 cores and see what happens
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,110
Dell MD 3000 15 bay 3.5" SAS enclosure
These Dell devices aren't a normal disk enclosure - they have their own RAID logic in the controllers. Did you create/export any disk groups on the unit? It supports a quasi-JBOD mode, but I believe it might still mask the SMART info.

You can swap them with the MD1000 series controllers as well.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,681
I thought more it better than less?

More is DEFINITELY not better than less. Are you a software developer, by any chance? :smile: Only sorta kidding. Most virtualization admins will tell you that one of their largest problems is people overspecifying resources which actually leads to contention for resources, and quite commonly software devs are the worst offenders. It's best to start with a smaller number and then increase as observed performance and shortfalls dictate. For ZFS, it's easy to see a need for two vCPU's for even the smallest system; there's a lot of kernel stuff going on for ZFS, plus also the middleware and protocol daemons like Samba running too, with the user and kernel stuff running simultaneously. If you can seriously see two or three Samba or AFP users actively interacting with the server on an ongoing basis, then it isn't too hard to picture four vCPU's being very useful. Doing jails? That could be a few more. Really think about how many CPU's might actually be truly active at one time on a more or less consistent basis, and then try #vCPU = maybe just a bit less than that to see how it works.
 

GW2

Dabbler
Joined
Jan 27, 2023
Messages
15
Yeah, it behaves the same with 2 or 4 cores as well. I may have to try it on bare metal to see if the problem is ESXi or TrueNAS, but I really do think it is truenas as the previous windows VM I had setup managing all these drives had no issue and good throughput using the same ESX host.
 

Whattteva

Wizard
Joined
Mar 5, 2013
Messages
1,824
More is DEFINITELY not better than less. Are you a software developer, by any chance? :smile: Only sorta kidding. Most virtualization admins will tell you that one of their largest problems is people overspecifying resources which actually leads to contention for resources, and quite commonly software devs are the worst offenders.
I'm a software developer and this statement kinda' offends me. We actually very intimately know how hard it is to code things in a multi-threaded way while at the same time ensuring thread safety and avoiding deadlocks. 99% of the time, I don't bother with multi-threading unless the performance profile benchmark clearly dictates the need for it cause it's just a big source of very subtle intermittent bugs.

Quite frankly, it sounds to me that you must work with a lot of mediocre software developers. :cool::tongue:
 
Last edited:

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,681
Quite frankly, it sounds to me that you must work with a lot of mediocre software developers.

I would agree that many modern software developers are mediocre. Possibly not what you meant. :smile: Do read on though, I'm sorta just needling you a bit. ${username}!

Actually it's feedback gathered from years of stuff like VMware user group meetings, vendors who have tools designed to identify this stuff in massive corporate hypervisor deployments, etc. Plus it's a well known problem with virtualization, discussed many times over the years.









But before you get too bent out of shape, let me say that I understand that this is not necessarily implying malice on the part of software developers, just unfamiliarity with virtualization. Most people have the experience that a desktop workstation that is "too big" is not a problem, and they also find that when they have to involve their IT or virtualization team to reprovision their VM for "more" resources, this is a slow and annoying process. As virtualization administrators, many of us are aware of user frustrations that sometimes wind up with a department or team "going rogue" and blowing some budget on external cloud resources at AWS just to be able to do what they want (often in the most costly manner possible); I like to work in a fact-based world, so if we can agree that I've demonstrated that this is a common problem, I'm also happy to agree that there are (and I've even met) many genius level developers out there who can easily uptake these concepts and integrate them into their worldview. I think even for the average developer, it just catches them unawares, and so it's just an issue of providing some simple guidelines. That's why I usually attack it in the way I explained in #5 thinking about how many "things" are actively going on simultaneously.

We actually very intimately know how hard it is to code things in a multi-threaded way while at the same time ensuring thread safety and avoiding deadlocks. 99% of the time, I don't bother with multi-threading unless the performance profile benchmark clearly dictates the need for it.

This argues in favor of single (or perhaps at most two) vCPU VM's. Suits me just fine. Many people are aghast and bewildered to discover many of my VM's are single vCPU, 256MB RAM, and i386 architecture. Very efficient. Lots of stuff works just fine that way.
 

Whattteva

Wizard
Joined
Mar 5, 2013
Messages
1,824
I would agree that many modern software developers are mediocre. Possibly not what you meant. :smile: Do read on though, I'm sorta just needling you a bit. ${username}!
I actually semi-meant that. I've actually interviewed quite a lot of these said mediocre developers myself. Majority of the bad ones, in my biased opinion, tend to be the more recent ones who are used to be babied by newer programming languages and generally tend to be far less knowledgeable about how software interacts with hardware at a lower level. Don't worry, I'm quite comfortable with being needled :wink:

That being said, let's examine this a bit further.
This one is more talking about IT managers, not developers.
Many times I see new virtualization admins add too many vCPUs to virtual machines after they’ve converted their physical machines. I suppose we could argue what he meant by "new virtualization admins" but I definitely don't manage VM's as part of my daily workflow.
"Customer with small vSphere environment of just two hosts" Again, unclear on who this customer might possibly be.
Few administrators
Sounds like another system administrator judging that this is posted on serverfault. We software developers tend to hangout more at stackoverflow.
Doesn't mention any bad actors, but yet another sysadmin blog!
Also doesn't mention any bad actors, but sysops... I'd venture a guess, is synonymous with sysadmins. Correct me if I'm wrong, but I'm starting to see a pattern here....:cool:
I feel like this might be the first true instance of a clueless software developer as we often do use elastic search as an analytics tool in software we deploy to get good understanding of how end users behave and use our software.
But before you get too bent out of shape, let me say that I understand that this is not necessarily implying malice on the part of software developers, just unfamiliarity with virtualization.
But your links really only incriminate 1 software developer and vast majority point to naive/beginner system admins!
Most people have the experience that a desktop workstation that is "too big" is not a problem, and they also find that when they have to involve their IT or virtualization team to reprovision their VM for "more" resources, this is a slow and annoying process. As virtualization administrators, many of us are aware of user frustrations that sometimes wind up with a department or team "going rogue" and blowing some budget on external cloud resources at AWS just to be able to do what they want (often in the most costly manner possible); I like to work in a fact-based world, so if we can agree that I've demonstrated that this is a common problem, I'm also happy to agree that there are (and I've even met) many genius level developers out there who can easily uptake these concepts and integrate them into their worldview. I think even for the average developer, it just catches them unawares, and so it's just an issue of providing some simple guidelines. That's why I usually attack it in the way I explained in #5 thinking about how many "things" are actively going on simultaneously.
No disagreements here.
This argues in favor of single (or perhaps at most two) vCPU VM's. Suits me just fine. Many people are aghast and bewildered to discover many of my VM's are single vCPU, 256MB RAM, and i386 architecture. Very efficient. Lots of stuff works just fine that way.
Yeap. In fact, I think for a software developer to intentionally start out multi-threading "all the things" just sound like a masochistic tendency. As I have already explained earlier, multi-threading turns a simple software into a complex one which, often times, could result in very subtle intermittent bugs that are very hard to track down due to its intermittent nature. Well, I say intermittent, but it really is a deterministic situation to which we have no idea what the circumstances required to reliably reproduce it are. So a lot of naive software developers will actually say "intermittent".
 

GW2

Dabbler
Joined
Jan 27, 2023
Messages
15
Well this is a lot of interesting reading, I usually keep my vcpus to a limit unless i notice performance, I just hear so many places ZFS is resource intensive so I went overboard, but still having the same problem, I think I am going to try installing truenas core on different hardware with the same HBA & Dell MD3000 disk shelf and see if its better or the same, I'm pretty lost on how to narrow this one down. I have another HBA coming too so maybe it will work better. Thanks for your input guys, I will let you know my results once I'm done trying a bare metal install
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,681
This one is more talking about IT managers, not developers.

I'm not discussing developers. I'm pointing out that overallocation of vCPU has been an ongoing problem for years, causing stumbles in many environments. I don't really have a way to post transcripts of in-person conversations that I've had in the past that I didn't record.
 

Whattteva

Wizard
Joined
Mar 5, 2013
Messages
1,824
I'm not discussing developers. I'm pointing out that overallocation of vCPU has been an ongoing problem for years, causing stumbles in many environments. I don't really have a way to post transcripts of in-person conversations that I've had in the past that I didn't record.
Well, I guess it is kind of like the intuitive thing to a layperson that doesn't really understand how applications work because 90% of the time with almost anything else in this world, more and bigger is better.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,681
Well, I guess it is kind of like the intuitive thing to a layperson that doesn't really understand how applications work because 90% of the time with almost anything else in this world, more and bigger is better.

That's correct. Their notion of their local private cloud as some sort of unlimited resource is false; if it weren't, then of course it would be better to allocate lots of vCPU all the time. :smile:

Buried within that is the secret of how Amazon AWS makes so much money.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,110
Well this is a lot of interesting reading, I usually keep my vcpus to a limit unless i notice performance, I just hear so many places ZFS is resource intensive so I went overboard, but still having the same problem, I think I am going to try installing truenas core on different hardware with the same HBA & Dell MD3000 disk shelf and see if its better or the same, I'm pretty lost on how to narrow this one down. I have another HBA coming too so maybe it will work better. Thanks for your input guys, I will let you know my results once I'm done trying a bare metal install
OpenZFS isn't as much "resource intensive" as "will make use of the resources you give it" so it will often be blamed for "wasting RAM" when in reality it's leveraging it for the Adaptive Read Cache (ARC) - the only truly wasted RAM is free RAM.

As noted above, the MD3000 has its own internal controllers, so that may still be interfering with TrueNAS being able to address them. Did you make any disk groups or virtual devices on the MD3000 unit itself?
 

GW2

Dabbler
Joined
Jan 27, 2023
Messages
15
These Dell devices aren't a normal disk enclosure - they have their own RAID logic in the controllers. Did you create/export any disk groups on the unit? It supports a quasi-JBOD mode, but I believe it might still mask the SMART info.

You can swap them with the MD1000 series controllers as well.
I think the MD1000 controller swap is what I need to do, did not work in other hardware either
 

Scharbag

Guru
Joined
Feb 1, 2012
Messages
620
Interesting reading.

I struggle to convince the OT guys that VMs should be provisioned with "just enough" CPU and RAM. They often argue that they need more, and more, and more... But yeah, it is easy to over-resource in a VM environment.

Good to hear you may be on the right track with the controller. Definitely a must to use a simple HBA, not a RAID card. Good luck with things!!
 

John Doe

Guru
Joined
Aug 16, 2011
Messages
635
I would agree that many modern software developers are mediocre. Possibly not what you meant. :smile: Do read on though, I'm sorta just needling you a bit. ${username}!

Actually it's feedback gathered from years of stuff like VMware user group meetings, vendors who have tools designed to identify this stuff in massive corporate hypervisor deployments, etc. Plus it's a well known problem with virtualization, discussed many times over the years.









But before you get too bent out of shape, let me say that I understand that this is not necessarily implying malice on the part of software developers, just unfamiliarity with virtualization. Most people have the experience that a desktop workstation that is "too big" is not a problem, and they also find that when they have to involve their IT or virtualization team to reprovision their VM for "more" resources, this is a slow and annoying process. As virtualization administrators, many of us are aware of user frustrations that sometimes wind up with a department or team "going rogue" and blowing some budget on external cloud resources at AWS just to be able to do what they want (often in the most costly manner possible); I like to work in a fact-based world, so if we can agree that I've demonstrated that this is a common problem, I'm also happy to agree that there are (and I've even met) many genius level developers out there who can easily uptake these concepts and integrate them into their worldview. I think even for the average developer, it just catches them unawares, and so it's just an issue of providing some simple guidelines. That's why I usually attack it in the way I explained in #5 thinking about how many "things" are actively going on simultaneously.



This argues in favor of single (or perhaps at most two) vCPU VM's. Suits me just fine. Many people are aghast and bewildered to discover many of my VM's are single vCPU, 256MB RAM, and i386 architecture. Very efficient. Lots of stuff works just fine that way.
there are some really helpful resources linked!
Thanks for that.


Sharing my configuration, which works good for me. so feel free to roast it or for others to adapt or contribute yours (if working good for you)

VM pfsense 10gbit
vCPU: 2, 1socket
Ram: 8gb all allocated
HDD: 25 GB
WAN adapter is passed thru (SR-IOV)
ESXi cpu readiness average: 0.05%

installed packages:
acme 0.7.3
Cron 0.3.8_1
haproxy 0.61_7
iperf 3.0.2_5
mailreport 3.6.3_3
Open-VM-Tools 10.1.0_5,1
pfBlockerNG-devel 3.1.0_11
Service_Watchdog 1.8.7_1
Status_Traffic_Totals 2.3.2_2
WireGuard 0.1.6_2


VM Truenas
vCPU: 2, 1socket (performance indicators for virtualized CPU activated)
Ram: 40gb all allocated
HDD: 16 GB
ESXi cpu readiness average: 0.06%
HBA for HDDs is passed thru

plugins: none
Jails: nextcloud
services: FTP, NFS, SMART, SMB


VM pfsense 1gbit

vCPU: 1, 1socket (hardware virtualization yes, performanceindicators for virtual cpu yes
Ram: 1gb
HDD: 8 GB
ESXi cpu readiness average: 0.05%

installed packages:
Open-VM-Tools 10.1.0_5,1

VM Debian 10 (plex only)
vCPU: 1, 1socket (hardware virtualization yes, performanceindicators for virtual cpu yes
Ram: 1gb
HDD: 200 GB
ESXi cpu readiness average: 0.12%
 

GW2

Dabbler
Joined
Jan 27, 2023
Messages
15
Well here's an update, guess what, it's an MD1000 not a 3000, so I now have spare controllers, but the enclosure was not the issue, I'm certain the issue is the HBA so I got another HBA in yesterday gonna crossflash it today and try again. It's an internal HBA, but i already used a raid card in that config with an adapter cable/bracket so I can at least try a car that is more known to work. Supposedly my Dell SAS 6gbps HBA is the same as an H310 just external, but I could not get it to work in windows or in trunas core, it would see disks, you could create folders or small files, but trying to copy over files only as big as an mp3 would timeout. I will let you know how it goes with this H310. Thanks for all your input so far.

BTW With regards to software developer/resource overprivisioning debate going on here I was following the guide on this forum linked below which says "As tempting as it is to under-resource FreeNAS, do try to aggressively allocate resources to FreeNAS, both memory and CPU."

 

Whattteva

Wizard
Joined
Mar 5, 2013
Messages
1,824
BTW With regards to software developer/resource overprivisioning debate going on here I was following the guide on this forum linked below which says "As tempting as it is to under-resource FreeNAS, do try to aggressively allocate resources to FreeNAS, both memory and CPU."

Not gonna speak for @jgreco here. But for myself, I do try to over-allocate at least RAM. As for CPU, I tend to allocate just 2 for basically everything at first until I find that I need more and increase them incrementally.
 
Last edited:
  • Like
Reactions: GW2

GW2

Dabbler
Joined
Jan 27, 2023
Messages
15
No luck with the other H310 either, drives show up, can make a pool, can make folders or small files on the drive, open and save text files, but trying to copy over a file bigger than about a megabyte everything just hangs, unexpected network error, no errors on truenas core reported. I would like to think the issue is the crossflashed adapter as I cannot get them to work well in windows or truenas core on any machine I've tried, perhaps I am doing it wrong but the guides seem pretty straight forward, so I have no idea. What it as an affordable SAS HBA that will work with truenas core and uses SFF-8088 connectors or SFF-8087?
 
Top