Clock sync is still a mysterious topic to many, so here is an introduction to our business.
Product. FSMLabs sells a mostly software product called TimeKeeper mostly to financial trading firms.
TimeKeeper Client software runs on an application server computer (Linux, Windows, Solaris) and locks the system clock to time from some authoritative source (like a GPS clock). On standard enterprise networks, TK Client can make the system clock accurate to well below one microsecond. TK Client can receive time over either or both of the common network protocols for time distribution (PTP IEEE 1588 and NTP).
TimeKeeper Server software “serves time” to clients over either or both protocols, can read time from Satellite Clocks, and can also act as an intermediary receiving time over the network and sending time over the network (PTP calls intermediaries boundary clocks, NTP calls them stratum servers).
TimeKeeper Compliance software collects, archives, and provides search capability over time accuracy information so that financial firms are able to respond to demands like: “show that the clocks on these machines involved in that trade were within required clock accuracy limits this day last year.”
TimeKeeper GM is a hardware appliance bundling TimeKeeper Server with GPS/Galileo/GNSS receiver and high quality oscillator that allows it to keep time steady even during interruptions of GPS service.
The suite of products provides high accuracy, but also strong fault tolerance, instrumentation, monitoring, and alerting plus tools for analyzing time network operation and visualization of time networks. TimeKeeper software mostly competes with open source programs that are not at the same level of accuracy, reliability, or usability, and hardware products designed mostly for needs of the telco and defense industries.
Timestamps in Financial Trading. There used to be mechanical devices that actually stamped a piece of paper with the current time. These devices allowed financial trading firms to learn patterns in the market, check efficiency of trading partners, exchanges and brokers, and protect against scams. For example, if customer orders to buy or sell large numbers of securities arrive just before someone in the brokerage does the same transaction and before the customer order is executed, you can suspect “frontrunning”. Without the timestamp, there is no way to detect such problems so every transaction was stamped with the time it arrived at the firm and the time it was sent to someone else for execution or other action.
The same thing happens with computers: transactions are “timestamped” so that the books can be examined by quants, risk officers, firm management and auditors and by market regulators. Electronic timestamping nowadays has to be very precise because of two things:
Transactions are fast - 100 millionths of a second (microseconds) or less between order and confirmation is not unusual at exchanges now and these are getting shorter.
The volume of transactions is so high and trading is so global that it is common for trading firms to have thousands of computers scattered around the world all trading at the same moment.
If two machines belonging to the same firm are in a NASDAQ colo, and one sends an order and the second receives the confirmation and the clock on the second computer is 100 microseconds behind the clock on the first computer, the timestamps might show the confirmation arrived before the order was sent. This would be hard to explain to a customer or regulator. Now suppose there are thousands of computers trading in Singapore, Chicago, London, and NY for that one firm and consider what the books are like if the clocks are wandering. Out of sync clocks are particularly deadly for automated trading and analysis algorithms that look for correlations between events on the market - fuzzy timestamps obscure patterns or create imaginary patterns. And for regulators trying to make sense of the interactions of many thousands of market participants, fuzzy timestamps make the job impossible. This is why the new MiFID2 regulations in the EU require many market participants to make sure all clocks involved in high speed trading (not necessarily HFT) synchronize all their clocks to the official UTC time within 100 microseconds. FINRA/SEC have recently tightened US rules to require clocks be synchronized to within 50 milliseconds of US official time (NIST) as well. Further tightening seems inevitable. Firms with poor timestamp accuracy are not only asking for regulatory problems, but are not capable of evaluating their own operations or detecting fraud or even of enforcing SLAs with network providers and trading partners. Flash Boys covers an example of how firms with crappy clocks can get blindsided.
Engineering. The method for synchronizing clocks on computers is pretty simple in the abstract but a challenge to get right. Some authoritative clock source is added to a computer network and application computers (clients) ask it for the time or automatically receive updates of the time. When these computers get fresh readings of the authoritative clock, they adjust their clocks. Clocks on most computers start to drift off microseconds every minute or so, the updates are sent maybe once a second or even more often. The updates from the authoritative clock take some time to cross the network to the application computers and the clients take some time to process the packet and start updating - so the time from the authoritative clock is stale and the clients have to take that into account.
Because computer networks are fragile and change operation due to congestion or switch behavior or a number of other factors the application computer has to be able to compensate for interruptions or slowdowns in the updates (and even temperature changes on the application server). And authoritative clocks break or are sometimes not that authoritative. The EUREX system shut down for hours a couple of years ago when an authoritative clock forgot about leap seconds and starting sending out a time 30 seconds off. Before FSMLabs entered the market nobody had thought about fault tolerance for these things in any serious way (except for “holdover” where GPS clocks have a high quality oscillator so they can stay active while GPS signals are not available.)
There are two common network protocols for synchronizing time. For NTP the client computer sends the authoritative clock a request for time and then does some estimate of how long it took to get the answer back - most simply by dividing the elapsed round trip time in two which then is an estimate of how much to add to the answer. PTP IEEE 1588 is slightly newer and (after many changes) is essentially the same, except that in some modes it adds a multicast of the time. Neither was designed for the properties of financial or other enterprise computing networks. NTP was originally for satellites and for decades was mostly used for synchronizing clocks on computer science department servers to be within a few seconds of authoritative time and PTP was designed for very slow mechanical devices (like welders) connected to a single shared ethernet cable.
The built in model for fault tolerance in NTP is to average multiple sources together - which means that any failure or blip on any source causes an error. PTP has a thing called Best Master Clock (BMC) which requires client computers to pick the source that advertises the best accuracy - to see how good an idea that is, the EUREX clock that failed told everyone it was doing very well. Generally, sources don’t even try to estimate network delays to the client making the accuracy estimate useless. The next enterprise PTP standard attempts to walk back from BMC. FSMLabs TimeKeeper software will accept time from multiple sources, using either or both protocols, cross check them, and failover down a list provided by IT staff who should know which sources are closer to the client and are better quality. Sophisticated filtering and smoothing, smart error detecting, end-to-end information - all are parts of making TimeKeeper more reliable. As an example, TimeKeeper generates a heat map of GPS reception so it can detect jamming of GPS signals. And it looks for protocol delays that indicate misconfigured switches in the network.
On a good network with good time source, TimeKeeper can keep a server computers clocks synchronized to the source clock within 500 nanoseconds or less. In more challenging conditions it can still do well - even virtual machines can be kept within a hundred microseconds of authoritative time if they are well configured.
Beyond Finance. Clock sync is needed in many places including internet gaming, manufacturing, and radio and TV broadcast but the most interesting area is in making big distributed software systems work fast reliably. Google’s Spanner database relies on clock sync to reduce data synchronization and there are many other places where synchronized clocks allow coordination without discussion.
Authoritative Time. Authoritative time generally comes from the sky - from satellites like GPS. Official world time is UTC time - which an acronym chosen to be wrong in both French and English. UTC time is calculated by the Bureau of Weights and Measures outside Paris by combining the times produced by the giant atomic clocks in a worldwide collection of national physics labs. Official US time comes from one of our national labs, NIST (National Institute of Standards and Technology). The other lab is USNO (US Naval Observatory). NIST time is distributed on the Internet (not so well). USNO loads its time, which is just a few billionths of a second different from NIST time, into the GPS satellites and, thanks to the theory of relativity, is able to broadcast that time worldwide. Even NIST uses GPS to calibrate its distributed clocks. Galileo is the European version of GPS, Beidu is Chinese, Glonass is Russian. The European financial regulators have explicitly stated that time from satellites meets their requirements for authoritative UTC time.
This document is copyright FSMLabs. The photo at the top is via Wikimedia.