Clock will not keep accurate date

Status
Not open for further replies.

Jonathan

Dabbler
Joined
Sep 2, 2013
Messages
11
Hello FreeNAS forum members,

I recently installed FreeNAS 9.1.0-Release x64 and I'm having issues with the clock not keeping accurate time. My hardware is a Dell F1DH 1 U server. After lots of failed debugging I thought maybe the hardware was no good and exchanged the server for a identical one, but I'm having the same problem.

I turned on ssh root login so all the commands I use are going into that shell and not the shell opened by the gui. I use the 'date' command to do my clock checking. I have seen the time go backwards by an hour or two in a matter of minutes. I don't think it has ever gone into the future. Sometimes it will run for several hours and the time is spot on. I had ntpd configured with the default time servers and thought I would try others. I now have my local ISP's timeserver defined but that did not change anything.

I know ntpd is trying to keep the clock syncronized based on log entries:
[root@freenas] /mnt# grep "time reset" /var/log/messages
Aug 31 16:12:34 freenas ntpd[1963]: time reset +24982.994787 s
Aug 31 20:13:03 freenas ntpd[1963]: time reset +599.993700 s
Sep 1 16:09:18 freenas ntpd[11770]: time reset +16.376696 s
Sep 1 16:16:34 freenas ntpd[12365]: time reset +270.628151 s
Sep 1 17:49:45 freenas ntpd[13311]: time reset +4480.967340 s
Sep 2 04:07:12 freenas ntpd[14191]: time reset +299.960150 s
Those values look very high to me and they are all over the place. Sometimes it is so bad that ntpd can't fix it and shuts itself down:
Sep 1 01:05:32 freenas ntpd[1963]: time correction of 26098 seconds exceeds sanity limit (1000); set clock manually to the correct UTC time.
To recover , I use 'ntpdate' to get the clock back to normal and then restart ntpd.

I did some research and discovered that my server has several clocks to choose from, each rated differently for accuracy:
[root@freenas] /mnt# dmesg | grep Timecounter
Timecounter "i8254" frequency 1193182 Hz quality 0
Timecounter "HPET" frequency 14318180 Hz quality 950
Timecounter "ACPI-fast" frequency 3579545 Hz quality 900
Timecounters tick every 1.000 msec
Timecounter "TSC-low" frequency 1500082330 Hz quality 1000
By default TCC-low is chosen because of its high quality. However I read that this clock is based on the CPU which is not always dependable when power save is enabled on the CPU. I added a new sysctl in the web gui to use HPET instead. I saw the same clock problems. I edited the sysctl in the web gui to use i8254 instead. I saw the same clock problems. I have not tried ACPI-fast yet but I'm guessing that I'll see the same problems.

Right now my workaround (which I don't like at all) is to have cron run 'ntpdate' every five minutes.

Does anyone have any suggestions on how I can debug this? This doesn't look like normal skew where you would drift just a few seconds over time.
 

Jonathan

Dabbler
Joined
Sep 2, 2013
Messages
11
Hello FreeNAS forum members,

I recently installed FreeNAS 9.1.0-Release x64 and I'm having issues with the clock not keeping accurate time. My hardware is a Dell F1DH 1 U server. After lots of failed debugging I thought maybe the hardware was no good and exchanged the server for a identical one, but I'm having the same problem.

I turned on ssh root login so all the commands I use are going into that shell and not the shell opened by the gui. I use the 'date' command to do my clock checking. I have seen the time go backwards by an hour or two in a matter of minutes. I don't think it has ever gone into the future. Sometimes it will run for several hours and the time is spot on. I had ntpd configured with the default time servers and thought I would try others. I now have my local ISP's timeserver defined but that did not change anything.

I know ntpd is trying to keep the clock syncronized based on log entries:
[root@freenas] /mnt# grep "time reset" /var/log/messages
Aug 31 16:12:34 freenas ntpd[1963]: time reset +24982.994787 s
Aug 31 20:13:03 freenas ntpd[1963]: time reset +599.993700 s
Sep 1 16:09:18 freenas ntpd[11770]: time reset +16.376696 s
Sep 1 16:16:34 freenas ntpd[12365]: time reset +270.628151 s
Sep 1 17:49:45 freenas ntpd[13311]: time reset +4480.967340 s
Sep 2 04:07:12 freenas ntpd[14191]: time reset +299.960150 s
Those values look very high to me and they are all over the place. Sometimes it is so bad that ntpd can't fix it and shuts itself down:
Sep 1 01:05:32 freenas ntpd[1963]: time correction of 26098 seconds exceeds sanity limit (1000); set clock manually to the correct UTC time.
To recover , I use 'ntpdate' to get the clock back to normal and then restart ntpd.

I did some research and discovered that my server has several clocks to choose from, each rated differently for accuracy:
[root@freenas] /mnt# dmesg | grep Timecounter
Timecounter "i8254" frequency 1193182 Hz quality 0
Timecounter "HPET" frequency 14318180 Hz quality 950
Timecounter "ACPI-fast" frequency 3579545 Hz quality 900
Timecounters tick every 1.000 msec
Timecounter "TSC-low" frequency 1500082330 Hz quality 1000
By default TCC-low is chosen because of its high quality. However I read that this clock is based on the CPU which is not always dependable when power save is enabled on the CPU. I added a new sysctl in the web gui to use HPET instead. I saw the same clock problems. I edited the sysctl in the web gui to use i8254 instead. I saw the same clock problems. I have not tried ACPI-fast yet but I'm guessing that I'll see the same problems.

Right now my workaround (which I don't like at all) is to have cron run 'ntpdate' every five minutes.

Does anyone have any suggestions on how I can debug this? This doesn't look like normal skew where you would drift just a few seconds over time.



Just wanted to add a data point to demonstrate how quickly the clock can go bad. Even with the 'ntpdate' cron job running every 5 minutes I have seen the clock be off as far as 25 minutes:
[root@freenas] ~# date
Mon Sep 2 17:39:58 PDT 2013
// After the 5 min mark, run date one more time and see how much ntpdate had to correct.
[root@freenas] ~# date
Mon Sep 2 18:05:00 PDT 2013 // 1805 - 1740 = 25 minutes skew in under five min.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Are you overclocking? Have you tried resetting BIOS to defaults and leaving clock settings for CPU/RAM/etc at "auto" or whatever the default is?
 

Jonathan

Dabbler
Joined
Sep 2, 2013
Messages
11
Are you overclocking? Have you tried resetting BIOS to defaults and leaving clock settings for CPU/RAM/etc at "auto" or whatever the default is?


No, I'm not overclocking. All of my CPU's run at %99-%100 idle so I feel I'm a bit overpowered on that front for a NAS. :)
Here are some specs on my CPUs and architecture:
[root@freenas] ~# sysctl -a | egrep -i 'hw.machine|hw.model|hw.ncpu'
hw.machine: amd64
hw.model: Intel(R) Xeon(R) CPU E5450 @ 3.00GHz
hw.ncpu: 8
hw.machine_arch: amd64

I thought I had tried the default BIOS settings with the first server before I exchanged it and did not see a difference. I'm running a very large file transfer right now so I can't check my BIOS settings. It may be a couple days before I can power down my FreeNAS box, but when I get a chance I'll look through the BIOS settings for any clock related settings. I'll also try the default BIOS settings like you suggest.
 

Jonathan

Dabbler
Joined
Sep 2, 2013
Messages
11
No, I'm not overclocking. All of my CPU's run at %99-%100 idle so I feel I'm a bit overpowered on that front for a NAS. :)
Here are some specs on my CPUs and architecture:
[root@freenas] ~# sysctl -a | egrep -i 'hw.machine|hw.model|hw.ncpu'
hw.machine: amd64
hw.model: Intel(R) Xeon(R) CPU E5450 @ 3.00GHz
hw.ncpu: 8
hw.machine_arch: amd64

I thought I had tried the default BIOS settings with the first server before I exchanged it and did not see a difference. I'm running a very large file transfer right now so I can't check my BIOS settings. It may be a couple days before I can power down my FreeNAS box, but when I get a chance I'll look through the BIOS settings for any clock related settings. I'll also try the default BIOS settings like you suggest.

Well, sadly my server rebooted itself tonight and I have no idea why. Not exactly getting warm fuzzies from this hardware. I went ahead and rebooted again since I had lost my large file copy anyway and went into the BIOS. I set the failsafe presets. I poked around the BIOS menus but didn't see anything obviously related to the clock. After booting up I disabled my 'ntpdate' cron job so that ntpd could handle things. I'll let it run overnight and check out the date in the morning.
 

Jonathan

Dabbler
Joined
Sep 2, 2013
Messages
11
Well, sadly my server rebooted itself tonight and I have no idea why. Not exactly getting warm fuzzies from this hardware. I went ahead and rebooted again since I had lost my large file copy anyway and went into the BIOS. I set the failsafe presets. I poked around the BIOS menus but didn't see anything obviously related to the clock. After booting up I disabled my 'ntpdate' cron job so that ntpd could handle things. I'll let it run overnight and check out the date in the morning.

The date is accurate this morning! Looking in /var/log/messages I see that ntpd only had to make one adjustment to the clock and that was right after boot.
Sep 2 23:40:00 freenas ntpd[1869]: ntpd 4.2.4p5-a (1)
Sep 2 23:40:06 freenas ntpd[1870]: time reset -1.200042 s
So far it looks like using the BIOS failsafe presets is working. Thank you for the suggestion cyberjock. I'll keep monitoring the date for the next several days to make sure all is well.

I'm not sure why I got that random reboot last night, if that happens again I'll open a new thread. I was moving a large directory from my old nas to this new one with a 'cp -a' command when it rebooted. I didn't want to lose my progress so I resumed with an 'rsync -a' and it looks like it just picked up where 'cp' left off. Hooray for rsync.

Thanks again,
Jonathan
 

Jonathan

Dabbler
Joined
Sep 2, 2013
Messages
11
The date is accurate this morning! Looking in /var/log/messages I see that ntpd only had to make one adjustment to the clock and that was right after boot.
Sep 2 23:40:00 freenas ntpd[1869]: ntpd 4.2.4p5-a (1)
Sep 2 23:40:06 freenas ntpd[1870]: time reset -1.200042 s
So far it looks like using the BIOS failsafe presets is working. Thank you for the suggestion cyberjock. I'll keep monitoring the date for the next several days to make sure all is well.

I'm not sure why I got that random reboot last night, if that happens again I'll open a new thread. I was moving a large directory from my old nas to this new one with a 'cp -a' command when it rebooted. I didn't want to lose my progress so I resumed with an 'rsync -a' and it looks like it just picked up where 'cp' left off. Hooray for rsync.

Thanks again,
Jonathan

Well it looks like the clock problem is still there; ntpd died. From /var/log/messages there is a single new ntpd entry:
Sep 4 18:27:02 freenas ntpd[1870]: time correction of 5401 seconds exceeds sanity limit (1000); set clock manually to the correct UTC time.

I'm not sure why things work just fine and then all of a sudden the clock is off by 5401 seconds. I can see where the 1 second may have happened due to drift, but I can't explain the other 90 minutes. The clock went backwards 90 minutes so I'm now 90 minutes behind.

I'll reset using 'ntpdate' and then restart ntpd. Anybody have any ideas where the 90 min may have gone?
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I don't have any idea. :(

Very weird problem to say the least. Are you sure you aren't using those QS or ES chips? I know someone that has used one before and had weird clock problems.
 

Jonathan

Dabbler
Joined
Sep 2, 2013
Messages
11
I don't have any idea. :(

Very weird problem to say the least. Are you sure you aren't using those QS or ES chips? I know someone that has used one before and had weird clock problems.

Do you mean the CPU chips? I don't know how to determine if they are QS or ES. Would that be silkscreened on the chip or could I run some command to extract that information?
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
No, ES(Engineering Sample) or QS (Qualification Sample). Those are the CPUs the make to test out the design and see if it has any issues. Then they fix the issues and that becomes the retail versions you buy at the store. The only way I've seen QS or WS appear is with CPU-Z in Windows.

The ES and QS often have weird obscure bugs nobody knows about because it was never meant for public consumption. They aren't supposed to be sold, but they do end up on ebay for considerably less than the retail versions. Many people buy them despite the potential downfalls.
 

Jonathan

Dabbler
Joined
Sep 2, 2013
Messages
11
No, ES(Engineering Sample) or QS (Qualification Sample). Those are the CPUs the make to test out the design and see if it has any issues. Then they fix the issues and that becomes the retail versions you buy at the store. The only way I've seen QS or WS appear is with CPU-Z in Windows.

The ES and QS often have weird obscure bugs nobody knows about because it was never meant for public consumption. They aren't supposed to be sold, but they do end up on ebay for considerably less than the retail versions. Many people buy them despite the potential downfalls.

Last night as I was debugging my ntp issue I noticed in /var/log/messages some USB errors:
(da0:umass-sim0:0:0:0): Error 5, Retries exhausted
And later I saw:
vm_fault: pager read error freenas (nfsd)
Then nfsd died and the server rebooted itself unexpectedly a few minutes later. I decided that maybe my USB thumbdrive was flaky. I bought a new 8GB Sandisk drive today and I'm going to install FreeNAS 9.1.0 on it and try it out.
 

Jonathan

Dabbler
Joined
Sep 2, 2013
Messages
11
Last night as I was debugging my ntp issue I noticed in /var/log/messages some USB errors:
(da0:umass-sim0:0:0:0): Error 5, Retries exhausted
And later I saw:
vm_fault: pager read error freenas (nfsd)
Then nfsd died and the server rebooted itself unexpectedly a few minutes later. I decided that maybe my USB thumbdrive was flaky. I bought a new 8GB Sandisk drive today and I'm going to install FreeNAS 9.1.0 on it and try it out.

The new USB drive is not helping. Still seeing clock issues. I gave up on this hardware platform and returned the server.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Yeah, probably a good idea.
 
Status
Not open for further replies.
Top