Trying to set up my first Scale cluster

JonathanTX

Dabbler
Joined
Nov 4, 2021
Messages
13
Hello all, i am trying to set up a test to see how the TrueNAS Scale works.

my hardware: not ideal. 2 R710s running RAID, but these are not the systems i will use if i ever get past the conceptual phase.

i work with EMC SANs at work the last 15 years, so i am up to speed on how iscsi and multipath connectivity work, but i am trying to figure out how this stuff is configured as pertains to TN-Scale.

so, both my R710s have a bonded interface for nic1-2, and 2 interfaces each with IPs on my iSCSI network (A and B are seperate vlans). those things seem to be working as esxpectd, the 2 r710s can ping all the other devices on their respective networks. i think individually the 2 systems would be at least useable... but when i try to create a cluster object in truecommand, i get the error:

NAS API error

"'IPv4Address' object is not iterable"


not sure exactly what this is trying to tell me. can anyone point me in the right direction?

also, i have not yet found a guide for end-to-end setup, if there is one, id like to read it.

thanks!
Jonathan
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,691
Clustering in SCAE for Angelfish is:

1. Glusterfs - needs 3 nodes
2. Minio for S3
3. Clustered SMB - APIs work, TrueCommand needs an update to 2.2 in Q1 next year

iSCSI is planned for Bluefin in 2022.
 

JonathanTX

Dabbler
Joined
Nov 4, 2021
Messages
13
i was really hoping this could be a 2 node solution. NFS or SMB shares, iscsi eventually.

if it has to be 3, then i guess it has to be. is there some definitive documentation out there i can read on Scale?
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,691
TrueCommand controls the clustering: https://www.truenas.com/docs/truecommand/clustering/
SMB will be documented in Q1.

2 node clusters are inherently unreliable.... and would have to be a mirror. This can be made more reliable with a 3rd arbiter node that is much smaller and acts as a tie-breaker .
We recommend a 2+1P (dispersed) model over 3 nodes for minimum reliable cluster.
 

JonathanTX

Dabbler
Joined
Nov 4, 2021
Messages
13
could the arbiter be a witness only without file storage, and possibly be a VM?

when i said prefer 2 node... i meant 2 physical nodes, i definitely would have no problem with a witness if its possible the witness could be a VM.
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,691
could the arbiter be a witness only without file storage, and possibly be a VM?

when i said prefer 2 node... i meant 2 physical nodes, i definitely would have no problem with a witness if its possible the witness could be a VM.

Possible, but not yet productized and verified. It is part of the plan.... it can certainly be a smaller node.
 

JonathanTX

Dabbler
Joined
Nov 4, 2021
Messages
13
well, i reset my test, with 3 identical nodes, but when i tried to created a replicated cluster from the 3, i still got same error.

NAS API error

"'IPv4Address' object is not iterable"

is there any relevant logs files i should take a look at on the truecommand server that might shed some light?
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,691
Identical nodes each need to have different IP addresses on the same subnet.
Can you document the specific IP address scheme.
 

JonathanTX

Dabbler
Joined
Nov 4, 2021
Messages
13
sure, both of my test scenarios as mentioned above are as such:

(also note that all .125.56 interfaces are in same vlan, as 1632 and 1633 all same matching vlan, alll connectivity verified)

truecommand is 10.125.56.55

scenario 1, 2 physical nodes, dell r710:
node 1
bond0 10.125.56.56/24
eno3 172.16.32.13/24 - iscsi-A
eno4 172.16.33.13/24 - iscsi-B

node 2
bond0 10.125.56.57/24
eno3 172.16.32.14/24 - iscsi-A
eno4 172.16.33.14/24 - iscsi-B

scenario 2, is 3 virtual machines (no iscsi, was just using this scenario to see how a NFS share would work)
node1 10.125.56.53/24
node2 10.125.56.43/24
node3 10.125.56.58/24
 
Last edited:

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,691
So you've bonded 2 eithernet ports (en1 and en2?) with lagg?

You are connecting TrueCommand via the bond0 port and it is that IP address that is not working.... is it one 1 or both nodes?

I'd suggest testing each node independently.... then start clustering.
 
Joined
Nov 7, 2021
Messages
3
I'm running into the same issue.
Ive tried to create each of the 4 gluster types over 2 systems (third system isn't up yet, just testing out)
Also tried doing a simple replicated/distributed on each system individually with 2 drives, with this i get a different error "[EFAULT] No peers detected"
First tested with link aggregated failover nics. Again tested with single nic, same result.
Restarted the gluster service on each system.
Have tested on both the docker latest and nightly builds with no difference in behavior.
Running TrueNAS-SCALE-22.02-RC.1-1 on both server and Command on my laptop on the same subnet.
 

JonathanTX

Dabbler
Joined
Nov 4, 2021
Messages
13
So you've bonded 2 eithernet ports (en1 and en2?) with lagg?

You are connecting TrueCommand via the bond0 port and it is that IP address that is not working.... is it one 1 or both nodes?

I'd suggest testing each node independently.... then start clustering

yes bond appears to be working properly, right now no reason to suspect thats an issue. also, the scenario 2 systems are VMs with just a single nic no bonding, and they get the exact same error.

the error is so generic that i cannot tell what it is trying to tell me, or what system it wants to tell me to check.
 

JonathanTX

Dabbler
Joined
Nov 4, 2021
Messages
13
no, i was trying to do replicated, but i just now tried a distributed and got the same error.


1636334641771.png
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,691
Thanks... I've asked the engineering team to look at this and see if its a bug or just lack of documentation on process.
 
Joined
Nov 7, 2021
Messages
3
To note i'm seeing these errors on the TrueNAS Scale side too, seem like Command is just iterating it back.
'IPv4Address' object is not iterable
 
Joined
Nov 7, 2021
Messages
3
Digging further some i see this in the middleware logs, seems there was a fix merged 3 days ago for this issue https://github.com/truenas/middleware/pull/7823 .
Code:
[2021/11/08 04:50:33] (ERROR) middlewared.job.run():394 - Job <bound method returns.<locals>.returns_internal.<locals>.nf of <middlewared.plugins.gluster_linux.peer.GlusterPeerService object at 0x7fc028da66d0>> failed
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/middlewared/job.py", line 382, in run
    await self.future
  File "/usr/lib/python3/dist-packages/middlewared/job.py", line 418, in __run_body
    rv = await self.method(*([self] + args))
  File "/usr/lib/python3/dist-packages/middlewared/schema.py", line 1131, in nf
    res = await f(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/middlewared/schema.py", line 1263, in nf
    return await func(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/middlewared/plugins/gluster_linux/peer.py", line 73, in do_create
    await self.middleware.call('cluster.utils.resolve_hostnames', [data['hostname']])
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1310, in call
    return await self._call(
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1267, in _call
    return await methodobj(*prepared_call.args)
  File "/usr/lib/python3/dist-packages/middlewared/plugins/cluster_linux/utils.py", line 60, in resolve_hostnames
    ips.extend(result)
TypeError: 'IPv4Address' object is not iterable
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,691
I am having this issue as well. Is there an eta on the release of RC 1.2?

Later this week... it going through it final testing. The nightlies fix the problem.

** Edit: Needed multiple rounds of testing and was pushed back to next week (week of 22nd Nov).
 
Last edited:
Top