Recently upgraded to 9.2.1.8 and now restarting a bunch

Status
Not open for further replies.

electricd7

Explorer
Joined
Jul 16, 2012
Messages
81
I'll give it a shot. The only problem is that the original pool only shows "replace" for da3 since I already off lined it. I don't know how to replace it with itself as it errors If I try that?
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
I'll give it a shot. The only problem is that the original pool only shows "replace" for da3 since I already off lined it. I don't know how to replace it with itself as it errors If I try that?

You'd have to wipe the drive first.
 

Apollo

Wizard
Joined
Jun 13, 2013
Messages
1,458
no you don't, at least I don't think you have to.
I think you can do zpool clear or something.

Let me give it a try first.
 

electricd7

Explorer
Joined
Jul 16, 2012
Messages
81
When I just try to replace it with itself, it says that da3 is already a member of a pool. I can't get to the specific error at the moment as I rebooted the system and added the new disk so its not there currently.
 

Apollo

Wizard
Joined
Jun 13, 2013
Messages
1,458
I understand but I am working on it.
 

Apollo

Wizard
Joined
Jun 13, 2013
Messages
1,458
You should be able to import your old da3 drive in the array doing the following:

Rememeber to do this when your system is not rebooting, preferably off the motherboard SATA ports.

1: System is OFF.
2: Plug all the drives that belong to that original array, even the old da3 drive.
3: Start system.
4: go into ssh shell if you can, or within Freenas shell or the direct console shell.
5: type: zpool status
6: All the drives should show as online. if not make a screen capture if under ssh (copy and paste works)
6: type: zpool clear name_of_your_pool
7: type: zpool scrub name_of_your_pool
8: Scrubing should start and will or should resilver only discrepancy blocks.
9: Will retrun number of Checksum erros. It is ok, just an indication of found discrepancy.
10: let it run until completion.

that should be enough to bring your pool in good standing.
If not, just update us on the errors or messages your are getting.
 

Apollo

Wizard
Joined
Jun 13, 2013
Messages
1,458

electricd7

Explorer
Joined
Jul 16, 2012
Messages
81
Just tried "zpool status" and one of my drives shows as offline...see copy:

[root@freenas] ~# zpool status
pool: freenas-boot
state: ONLINE
scan: none requested
config:

NAME STATE READ WRITE CKSUM
freenas-boot ONLINE 0 0 0
gptid/4f4240e4-902b-11e4-87c0-002590760c9d ONLINE 0 0 0

errors: No known data errors

pool: pool1
state: DEGRADED
status: One or more devices has been taken offline by the administrator.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Online the device using 'zpool online' or replace the device with
'zpool replace'.
scan: scrub repaired 0 in 2h15m with 0 errors on Wed Dec 31 15:05:58 2014
config:

NAME STATE READ WRITE CKSUM
pool1 DEGRADED 0 0 0
raidz2-0 DEGRADED 0 0 0
gptid/764546b3-d85d-11e1-abea-002590760c9d ONLINE 0 0 0
gptid/76a793b6-d85d-11e1-abea-002590760c9d ONLINE 0 0 0
9217263701387507910 OFFLINE 0 0 0 was /dev/gptid/01754a95-9202-11e4-9761-002590760c9d
gptid/77850560-d85d-11e1-abea-002590760c9d ONLINE 0 0 0
gptid/59d0ca2a-dc19-11e1-baa7-002590760c9d ONLINE 0 0 0
gptid/5ea47a7b-dc19-11e1-baa7-002590760c9d ONLINE 0 0 0
 

Apollo

Wizard
Joined
Jun 13, 2013
Messages
1,458
Can you give a bit of a background here? For instance how are the drives connected to the system, is it directly through the motherboard SATA ports? What about the test with the single drive?

So far you seem to have been able to run scrub without any issue and no errors have been reported that would indicate data discrepancy.

Now you need to run the following command to place the drive online:
Code:
zpool online pool1 9217263701387507910 gptid/01754a95-9202-11e4-9761-002590760c9d


Description:

zpool online : Request to bring a drive online.
pool1 : name of your pool.
9217263701387507910 : Number generated/used by Freenas to identify the offline drive.
gptid/01754a95-9202-11e4-9761-002590760c9d : Number generated/used by Freenas/ZFS to identify the element of the array.

Wait a moment and check the status of the pool:

zpool status

The old drive should be now online.
If it is, run scrub again. It will start resilvering but not necessarily overwriting everything, just the faulty blocks.
 

electricd7

Explorer
Joined
Jul 16, 2012
Messages
81
OK, so after having disks connected to local motherboard for 12+ hours I replaced the M1015 HBA with a SuperMicro AOC-S2308L-L8e HBA. I have the new disk installed in place of the failing da3 and am re-silvering. I will keep you posted, but think maybe the issue was with the M1015 after all.
 

Apollo

Wizard
Joined
Jun 13, 2013
Messages
1,458
OK, so after having disks connected to local motherboard for 12+ hours I replaced the M1015 HBA with a SuperMicro AOC-S2308L-L8e HBA. I have the new disk installed in place of the failing da3 and am re-silvering. I will keep you posted, but think maybe the issue was with the M1015 after all.
Technically, you could have kept the original da3 drive, unless it is under warranty and then you could RMA it then.
 

electricd7

Explorer
Joined
Jul 16, 2012
Messages
81
So just to close the loop. The problem with my FreeNAS system turned out to be disks. I thought it was controller, but after moving to the motherboard, I still couldn't get more than 12 hours or so of uptime. I then unplugged one of my disks that was still getting SMART errors and was able to get a couple of runs of 12+ hours, but not consistent. Finally I backed up all the data, replaced all 6 disks, created a new pool and copied the data back. I am on 48 hours of uptime at this point which I haven't seen since late November. Thanks all for your help!
 

solarisguy

Guru
Joined
Apr 4, 2014
Messages
1,125
So just to close the loop. The problem with my FreeNAS system turned out to be disks. I thought it was controller, but after moving to the motherboard, I still couldn't get more than 12 hours or so of uptime. I then unplugged one of my disks that was still getting SMART errors and was able to get a couple of runs of 12+ hours, but not consistent. Finally I backed up all the data, replaced all 6 disks, created a new pool and copied the data back. I am on 48 hours of uptime at this point which I haven't seen since late November. Thanks all for your help!
That is a little bit unusual in my experience. Hard drive problems should not be causing crashes that leave no traces... Please remember that the operating system is not being run from the hard drives, although it writes to them and in theory it needs to read some state files from .system. I think it only writes to .system and has these files in memory. FreeNAS might panic while being unable to handle problems with either reading or writing to hard drives, but it should report about it to the console...

The only thing that comes to mind is the undersized or faulty power supply coupled with faulty hard drives, i.e. the hard drives drawing substantially more current than they should.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Cyberjock has previously implied that FreeNAS isn't very resilient against a missing .system dataset, since it's assumed to be on a redundant ZFS pool. The rationale being that the point where that happens is at or beyond that of data loss.
 

solarisguy

Guru
Joined
Apr 4, 2014
Messages
1,125
Cyberjock has previously implied that FreeNAS isn't very resilient against a missing .system dataset, since it's assumed to be on a redundant ZFS pool. The rationale being that the point where that happens is at or beyond that of data loss.
That is very true, that is why I keep .system in a zpool that has its disks and controller different from my data pool. I have just entirely forgotten about that :)

A troubleshooting lesson learned could be to insert a USB memory device (another one beyond the OS one) and migrate .system onto it. That as soon as possible move .system out of equation. And thus possibly enabling preservation of log files, if the problems are related to either a SATA/SAS controller or disks.
 

Apollo

Wizard
Joined
Jun 13, 2013
Messages
1,458
That is very true, that is why I keep .system in a zpool that has its disks and controller different from my data pool. I have just entirely forgotten about that :)

A troubleshooting lesson learned could be to insert a USB memory device (another one beyond the OS one) and migrate .system onto it. That as soon as possible move .system out of equation. And thus possibly enabling preservation of log files, if the problems are related to either a SATA/SAS controller or disks.

How do you manage migrating the .system? Is it done via web GUI under "System Dataset" and point "System dataset pool:" to whatever pool is available?
What is going to hapen if the pool containing the migrated .system fails? Is there a risk to affect the data pool?
 

solarisguy

Guru
Joined
Apr 4, 2014
Messages
1,125
I do not use 9.3 yet, but since 9.2.1.6 using GUI to move .system to another zpool (by pointing to the new location) just worked.

GUI → System → Settings → System Dataset → System dataset pool → Save :smile:

There should be no risk to data (unless someone manually placed data inside .system and other unusual scenarios), however a reboot is required to complete the process.
 

Apollo

Wizard
Joined
Jun 13, 2013
Messages
1,458
I do not use 9.3 yet, but since 9.2.1.6 using GUI to move .system to another zpool (by pointing to the new location) just worked.

GUI → System → Settings → System Dataset → System dataset pool → Save :)

There should be no risk to data (unless someone manually placed data inside .system and other unusual scenarios), however a reboot is required to complete the process.
I am reformulating the question in such a way:
What makes the pool containing the .system folder more reliable than the data pool? If the pool containing the .system folder crashes, then does it mean Freenas will panic and reboot or is it going to use a copy in RAM if any or other means? I would think on a RAIDZ2 it should be safer than a single drive hosting .system.
 

solarisguy

Guru
Joined
Apr 4, 2014
Messages
1,125
I am reformulating the question in such a way:
What makes the pool containing the .system folder more reliable than the data pool? If the pool containing the .system folder crashes, then does it mean Freenas will panic and reboot or is it going to use a copy in RAM if any or other means? I would think on a RAIDZ2 it should be safer than a single drive hosting .system.
My original comment was about having .system on a USB device only temporarily, while troubleshooting any disks and/or controller issues.

In my servers, I decided to have two additional small, slow, low-power, mirrored hard drives for .system. Additionally, I place my jails there.

In both above scenarios the main idea is that if anything happens to the controller or hard drives with real data™ :), FreeNAS would have a chance to write about it into logs stored in .system. I also hope that crash and burn of .system would keep my data intact.

FreeNAS keeps writing to .system, thus .system gone is not an option. However, it terms of configuration FreeNAS used to write only some samba information there. I had a server semi-operational with .system being gone. With 9.3 it might be entirely different!
 
Status
Not open for further replies.
Top