08/15/2011
Time synchronization software is often too ready to believe whatever it is told. TimeKeeper is more skeptical—for good reason.
Most time synchronization software is, well, gullible, designed on the assumption that time sources and local clocks are reliable. TimeKeeper is more skeptical and can often compensate for bad data, but also goes to lengths to document any problems it sees. In the upcoming 5.0 release TimeKeeper will be be able to use sophisticated cross-checking to detect failed clocks, network congestion, or even deliberate hacking of GPS time and even in the current version it produces a log that serves as an audit trail.
The engineering team evaluating TimeKeeper Client for a prospective customer called us up to say that Timekeeper was reporting trouble locking on to a reference time and that other client software was not reporting errors. After examining the log, we asked them to check whether something was wrong with the configuration or operation of the network clock. The first discovery was that the time clock was on the other side of the ocean “in a test lab”. After some discussion, we got someone to check on it in person. And what he found was sauna-like heat in room with no air-conditioning and a pile of machines balanced on one another with no air-flow. Once the poor abused network clock was moved into a rack in an air-conditioned room, TimeKeeper was happy. It turned out that the other client software, the software that had not reported errors, had just been passing on bad time values without any warning.
Another one of our customers has a need for keeping timing error down in the submicrosecond range - using the NTP protocol. The computer racks they use for a critical trading application rely on an advanced feature of TimeKeeper Server to synthesize time from a “pulse per second” that is distributed from a GPS satellite radio “clock” combined with NTP time distributed over the network. The idea is that the NTP time should fix the second and the “pulse per second” can be used to get accuracy down to nanoseconds. The system has been rock steady in one installation, but when they added a second installation TimeKeeper would sometimes refuse to use the pulse-per-second, relying on a synthetic time, and would complain about bad syncs. It turned out that the NTP time was wobbling by a large fraction of second from true time because of a failure in the time clock. Replacing the time clock fixed the error.
Silent failure and undetected problems with time feeds or local oscillators pose a problem that is not yet widely appreciated in the automatic trading world. Imagine if the sauna had been on for a production system and client software had failed to detect the problem. Days or even weeks of trading could have been carried out on a seriously unstable time base without any indication. It’s possible that traders could have been tweaking algorithms to try to solve a problem that did not come from their algorithms at all. Or suppose that time was actually being correctly supplied to systems and something went wrong – how would a dispute over the cause of the problem be resolved if time synchronization software did not provide a reliable audit trail.