FreeNAS 11.1 U6 rebooting randomly when transfering via iSCSI and vmware esxi

Status
Not open for further replies.

titanve

Explorer
Joined
Sep 12, 2018
Messages
52
Assuming that it was MRU before and RR now, you should start seeing traffic on both NICs, unless there is something wrong with your iSCSI portal/target settings.

Any suggestion regarding my portal/target settings that I should care/review about?

Thanks
 

titanve

Explorer
Joined
Sep 12, 2018
Messages
52
I've got this messages in messages, what do they mean? Should I restart my freenas and/or the esxi host after the roundrobin multipath modify?:
Captura de pantalla 2018-09-12 a la(s) 4.41.05 p. m..png
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
I've got this messages in messages, what do they mean? Should I restart my freenas and/or the esxi host after the roundrobin multipath modify?:
View attachment 25656
That means you probably have bad RAM, likely on memory bank 8 (whatever your hardware maps that to)

You should check the iLO/out-of-band management logs on your server and hopefully it will identify the faulty DIMM for you by processor and slot
 

titanve

Explorer
Joined
Sep 12, 2018
Messages
52
That means you probably have bad RAM, likely on memory bank 8 (whatever your hardware maps that to)

You should check the iLO/out-of-band management logs on your server and hopefully it will identify the faulty DIMM for you by processor and slot

Ok, I'll connect through iLO to check that out.

The backup succesfully runned last night and the freenas did not reboot itself. Let's see what happens tonight.

Thank you very much.
 

titanve

Explorer
Joined
Sep 12, 2018
Messages
52
That means you probably have bad RAM, likely on memory bank 8 (whatever your hardware maps that to)

You should check the iLO/out-of-band management logs on your server and hopefully it will identify the faulty DIMM for you by processor and slot

Yeah, I should check the iLO because I got the error on memory again:
Captura de pantalla 2018-09-13 a la(s) 4.50.37 a. m..png
 

titanve

Explorer
Joined
Sep 12, 2018
Messages
52
I'm also seeing that the memory isn't being released, it keeps getting consumed after the backups, but shouldn't it be released after the backup job (via iscsi) finishes?
Captura de pantalla 2018-09-13 a la(s) 5.03.36 a. m..png
 
Joined
Dec 29, 2014
Messages
1,135
You definitely have memory problems. I have seen the MCA errors in some of my servers. Some motherboards are more sensitive than others. I got some of those errors with mixed manufacturers of the installed DIMM's even though they all had the same spec. In any case, you can probably expect wonky things to keep happening until you make those MCA errors go away.
 

titanve

Explorer
Joined
Sep 12, 2018
Messages
52
You definitely have memory problems. I have seen the MCA errors in some of my servers. Some motherboards are more sensitive than others. I got some of those errors with mixed manufacturers of the installed DIMM's even though they all had the same spec. In any case, you can probably expect wonky things to keep happening until you make those MCA errors go away.

I'm sorry but what is MCA ??

Thanks
 
Joined
Dec 29, 2014
Messages
1,135
I'm sorry but what is MCA ??

In the error screenshot, it has "freenas MCA" as part of the text of the error message. I was abbreviating because I am a lousy typist. :smile:
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Yeah, I should check the iLO because I got the error on memory again:

Definitely a bad stick. Check the iLO logs and it should hopefully call out the stick.

I'm also seeing that the memory isn't being released, it keeps getting consumed after the backups, but shouldn't it be released after the backup job (via iscsi) finishes?

No, the ARC will stay populated until it needs to replace it with more appropriate cached data. Free RAM is wasted RAM.
 

titanve

Explorer
Joined
Sep 12, 2018
Messages
52
In the error screenshot, it has "freenas MCA" as part of the text of the error message. I was abbreviating because I am a lousy typist. :)

lol, thanks!
 

titanve

Explorer
Joined
Sep 12, 2018
Messages
52
Definitely a bad stick. Check the iLO logs and it should hopefully call out the stick.



No, the ARC will stay populated until it needs to replace it with more appropriate cached data. Free RAM is wasted RAM.

Hello, I just checked the iLO but saw nothing at all, I'm not sure if I check all the correct places for memory error, I also checked the logs and nothing. What am I doing wrong? Do you know the place where I should look for?

Do you have any Intel based network card suggestion?

Thanks
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112

titanve

Explorer
Joined
Sep 12, 2018
Messages
52
Try ipmitool sel elist and comb through those lines to see if it identifies memory faults there.

For Intel NICs, anything dual-ported on the Pro/1000 series should work, but the older PT cards are less power-efficient. Look for the ET or newer series.

Helpful links:
https://forums.servethehome.com/ind...a-friends-don't-let-friends-buy-realtek.2663/

Comparison:
https://ark.intel.com/products/family/46827/Gigabit-Ethernet-Adapters#@Gigabit-Ethernet-Adapters

I ran the command and I've got this:
Captura de pantalla 2018-09-13 a la(s) 4.09.42 p. m..png
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
While it isn't any help in narrowing down the memory issue, that fact that the hardware watchdog is forcing hard resets on the system definitely points to "something is really wrong with the hardware"

Is there a "system event log" or similarly named in the BIOS? That might record the memory faults you need to trace down the bad DIMM.
 

titanve

Explorer
Joined
Sep 12, 2018
Messages
52
While it isn't any help in narrowing down the memory issue, that fact that the hardware watchdog is forcing hard resets on the system definitely points to "something is really wrong with the hardware"

Is there a "system event log" or similarly named in the BIOS? That might record the memory faults you need to trace down the bad DIMM.

Hello, I need to check if that log is on the BIOS.

Is this normal? Just 340 MB of free RAM:
upload_2018-9-15_5-51-19.png
 

titanve

Explorer
Joined
Sep 12, 2018
Messages
52
Yes. ZFS will use (almost) all available RAM for the ARC, and will free it up if another process requests it.

ok, I have another question, what is the "scrub" for? I receive emails telling me that the scrub is running.

Thanks
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
ok, I have another question, what is the "scrub" for? I receive emails telling me that the scrub is running.

Thanks
Scrubbing is basically an error-check on your pool and proactively looking for any data corruption. You definitely want these running routinely as well as SMART tests.
 
Status
Not open for further replies.
Top