Blog

Customer complaints

02/14/2012

“I saw a 1.2us peak offset today on the server running timekeeper “. We’re working on it. But this is on a heavily used network. There appears to be a problem with the network card. It’s a pessimistic estimate of the error. And this is a peak offset.

Virtual Machines and Time Synchronization

11/02/2011

TimeKeeper’s advanced algorithms can compensate for virtual machine difficulties in keeping accurate time.

The problem with a virtual machine as far as time synchronization goes is that the VM will suspend for seconds at a time while the underlying operating systems work on something else. As it becomes more common to migrate virtual machines within a cluster or even a larger data center   in order to load balance, this problem gets worse. A VM waking up after say 5 seconds has passed will detect a huge gap between its current time and the time provided by the clock. After a day or two, it’s not unusual to see a VM that is seconds out of sync. But a VM running TimeKeeper will smoothly recover its footing rapidly and smoothly as the smoothing algorithms TK uses to compensate for network packet delays and oscillator variation produce a quick convergence with a time source. You can hook a VM running timekeeper up to an NTP source and expect pretty good time tracking. This is not something we were initially targeting - for financial trading software the random delays of VMs are not acceptable - but it’s an interesting side effect. If VMs are being used for large scale database processing or map-reduce, the stability of a TK validated timestamp may improve data reliability and also reduce locking overhead.

Meet FSMLabs, London STAC

10/30/2011

November 7 in London

Victor Yodaiken will be speaking at the STAC London performance summit and participating in a panel on Time Synchronization.

Meet FSMLabs in New York City September 19 2011

08/24/2011

FSMLabs will have representatives at The Linux in Finance Conference

Come talk with us about time, timestamping, TimeKeeper 5.0, and other upcoming FSMLabs innovations. Send us an email for an appointment.

Silent failures and time audits

08/15/2011

Time synchronization software is often too ready to believe whatever it is told. TimeKeeper is more skeptical—for good reason.

Most time synchronization software is, well, gullible, designed on the assumption that time sources and local clocks are reliable. TimeKeeper is more skeptical and can often compensate for bad data, but also goes to lengths to document any problems it sees. In the upcoming 5.0 release TimeKeeper will be be able to use sophisticated cross-checking to detect failed clocks, network congestion, or even deliberate hacking of GPS time and even in the current version it produces a log that serves as an audit trail.

The engineering team evaluating TimeKeeper Client for a prospective customer called us up to say that Timekeeper was reporting trouble locking on to a reference time and that other client software was not reporting errors. After examining the log, we asked them to check whether something was wrong with the configuration or operation of the network clock. The first discovery was that the time clock was on the other side of the ocean “in a test lab”. After some discussion, we got someone to check on it in person. And what he found was sauna-like heat in room with no air-conditioning and a pile of machines balanced on one another with no air-flow. Once the poor abused network clock was moved into a rack in an air-conditioned room, TimeKeeper was happy. It turned out that the other client software, the software that had not reported errors, had just been passing on bad time values without any warning. 

Another one of our customers has a need for keeping timing error down in the submicrosecond range - using the NTP protocol. The computer racks they use for a critical trading application rely on an advanced feature of TimeKeeper Server to synthesize time from a “pulse per second” that is distributed from a GPS satellite radio “clock” combined with NTP time distributed over the network. The idea is that the NTP time should fix the second and the “pulse per second” can be used to get accuracy down to nanoseconds. The system has been rock steady in one installation, but when they added a second installation TimeKeeper would sometimes refuse to use the pulse-per-second, relying on a synthetic time, and would complain about bad syncs. It turned out that the NTP time was wobbling by a large fraction of second from true time because of a failure in the time clock. Replacing the time clock fixed the error.

Silent failure and undetected problems with time feeds or local oscillators pose a problem that is not yet widely appreciated in the automatic trading world. Imagine if the sauna had been on for a production system and client software had failed to detect the problem. Days or even weeks of trading could have been carried out on a seriously unstable time base without any indication. It’s possible that traders could have been tweaking algorithms to try to solve a problem that did not come from their algorithms at all. Or suppose that time was actually being correctly supplied to systems and something went wrong – how would a dispute over the cause of the problem be resolved if time synchronization software did not provide a reliable audit trail.

Measurement of TimeKeeper

06/16/2011

FSMLabs has a rigorous, highly automated, test and regression system that is absolutely necessary to deliver high performance.

image
Time Synchronization solutions are hard to test and often are provided with “self-test” components that report numbers which border on wishful thinking. Usually, the time synchronization solution test component reports the “offset” from the correct time - or at least an estimate of that offset. But how accurate is that estimate? If you think about it, if the synchronizer really “knew” the offset, it could correct its time and reduce the
error to zero. Actually, our tests show that those self-reported numbers are often wildly off.  What we do is build various test systems where the “synchronized time” can be compared in some way to an external reference - we want to compare actual time to computed time. For example, when TimeKeeper is trying to lock client system time to a reference time coming from some network time server that collects GPS time, we can run the “pulse per second” generated by that time server into a cable that connects to the client computer directly and then run special software that waits for a pulse and reads the “synchronized time” as the pulse shows up in real-time. TimeKeeper uses data that comes over the network connection, but our test software compares that to the hardware generated pulse-per-second. What you want to see is that the synchronized time is nearly exactly on a second boundary or perhaps a little past depending on how long the signal takes to propagate down the wire.  These measurements allow us to both to evaluate our algorithms for time synchronization and our test tool.

There are three quantities tracked on this graph: “raw”, “offset”, and “local” (click on it for a larger view). “Raw” and “offset” are our estimates of error where the second one is smoothed out by the algorithm. “Localtest” is actual error - using the pulse-per-second hardware. As you can see, these times converge rapidly.  “Raw” is the time TimeKeeper computes to be its instantaneous variance (error) from the reference time.  The IEEE PTP protocol is designed for no-traffic networks that do not introduce much if any variation in transfer time, but in the real-world we cannot rely on having such networks. So TimeKeeper has some sophisticated algorithms to synthesize a correct time from both sources - and we keep a running calculation of what we think the worst case error may be. “Offset” is the error we compute in a further smoothed time that TimeKeeper reports to users - because we want to avoid rapid “corrections” whenever possible. And “localtest” is a time error computed from a GPS “pulse per second” signal run directly into the client for cross-check purposes. That is, “localtest” crosschecks the time TimeKeeper computes against the pulses directly generated by the GPS clock hardware. It’s important to note that not only do we converge on correct time, but we do so very quickly and then lock onto it.

Infiniband and the FSMLabs Drill Down design approach

06/14/2011

TimeKeeper on Infiniband.

FSMLabs has just validated TimeKeeper performance on two high frequency trading systems that are based on Infiniband - the super low-latency networking technology.. Performance was superb. To get Infiniband working in the test systems took about 15 minutes. In both sites, TimeKeeper was inserted into a working NTP system as both server and client and just worked. In one system the TimeKeeper Server was itself accepting time from a PTP master clock on an Ethernet network and acted as a bridge. In a second system, the TimeKeeper server ran on a device that contained a GPS time-clock PCI card.

The high performance and “no muss, no fuss” operation validates FSMLabs “drill down” approach to reconciling the conflicting requirements of standards and high performance.  Our design approach relies on drilling down through layers of general purpose software to get raw hardware performance on highly optimized purpose built software for critical functions. Essentially we take on a big part of the effort of balancing standards against performance in our software design/implementation project - so it’s not a problem for over-worked IT staffs.  IT departments want standardized hardware and software that is widely compatible with a large range of devices, applications, and programs. So a special purpose operating system or even one that has been modified to support a special API or for functionality, rapidly becomes a huge expense as it is adapted to support rapidly changing system compute servers, drivers, devices, and software. On the other hand, standard platforms have to be all things to all people and necessarily sacrifice some performance/reliability/security. You can’t get “general purpose” and “finely honed for purpose” in the same box.  Well, you can, if you can bypass or override generic functions in just those places where you need specific performance. Doing that drill down while keeping the rest of the system safe is a pretty difficult technical play, but it pays big dividends.  That’s why we get microsecond timing accuracy over Ethernet - and that’s why our Timekeeper software can leverage the Linux general purpose networking support and the Infiniband drivers to get submicrosecond accuracy over the rock steady Infiniband interconnect. image (photo from Chris Dag)

See TimeKeeper at SIFMA 2011

06/12/2011

TimeKeeper is being exhibited at SIFMA by several resellers. Drop by the Symmetricom or Spectracom booths and ask them about how TimeKeeper takes time “the last mile” to the application program.

or drop us an email .(JavaScript must be enabled to view this email address)

image

x

Is your application’s time stale?

03/02/2011

Matt Sherer, engineer on FSMLabs’ TimeKeeper product, writes:

Having highly accurate time available on the network is great.  Getting that time to the application before it’s stale is even better.

Even highly accurate time can get pretty stale by the time it gets to the applications that need it. Lost syncs, network latencies, and other factors can accumulate. Even having local hardware that corrects perfectly for these factors is not a complete solution.

The trouble is, applications don’t run on that card with the accurate time - they’re running under a host operating system. Even if that card provides perfect time, the OS may be delayed in processing it. Or it could mangle the value in an attempt to satisfy conflicting clock requirements.

Within the OS, an application asking for the time on different processors may get different results.  Just requesting time from the OS can add significant additional overhead in the application. The result here is that when your application returns with a time sample from the OS, that time may be stale or just wrong.

FSMLabs’ TimeKeeper can remove these ambiguities. TimeKeeper can deliver or consume time over a network (NTP or PTP), but let’s just look at how it can avoid timing ambiguities locally.

TimeKeeper converges on an accurate time and provides it throughout the entire system - not just to specialty applications or to the OS (which may again skew time). Any application or OS component on the system, when it asks for time, will be getting it from TimeKeeper, undiluted and undelayed.

Having a single source of trustworthy time provides a number of benefits:

* The OS is no longer providing different time values to different users depending on their clock.

* The OS has consistent timestamping - all internal state, from network sockets to filesystem updates, are all on a common time base.

* Time drift between processors is gone - every processor has the same time.

* Without per-processor skew, there’s no chance that an application migrating between processors will see time move backwards.

So, TimeKeeper gives the system accurate time directly, whether it’s serving the OS or applications on the OS.  Applications can trust the time they get is accurate and not stale.

This is all well and good - but TimeKeeper actually speeds up the process of delivering time too.

It used to be that getting time from the OS involves a system call. That’s expensive - state has to be saved, the application may be switched out, housekeeping overhead may take time, the OS more code has to be run, and the application just isn’t getting real work done.  Modern Linux versions have reduced that overhead, and some, like Red Hat Enterprise Linux 6, avoid the
system call entirely. Kernel support for a feature called VDSO allow specially designed services to provide data - like time - without system call overhead.

TimeKeeper leverages this support to further improve performance.  Not only is TimeKeeper getting a more accurate and unified time to the entire system, it can actually speed the process of getting time to the caller. Before we show TimeKeeper’s numbers, here are the improvements in using VDSO over system calls in stock Red Hat Enterprise Linux 6:


FunctionOverhead improvementSpeedup
gettimeofday48ns54%
clock_gettime49ns60%


If VDSO is supported, it is selected transparently over a system call. There are no source changes to be made in applications.  As you can see, skipping that system call is very worthwhile - it cuts the time spent by more than 50%.

When TimeKeeper starts, if VDSO or vsyscalls are supported, it transparently provides the more accurate time in place of Linux’s data.  Again, there are no source changes
needed. Applications don’t even need to be restarted.  Once enabled, these numbers get even better:


FunctionTimeKeeper improvementAdditional speedup on RHEL6’s VDSO
gettimeofday9ns25%
clock_gettime10ns30%


TimeKeeper never leaves the processor for data, and it doesn’t enter the OS - so it can take Red Hat 6’s improved performance, and still improve upon that by 30%.  (In fact, TimeKeeper also provides an optional direct access function that can reduce overhead by another 15%.) Applications asking for time can have it, accurately, in less than 20 nanoseconds.

What does all this mean? Well, a number of things:

* Applications can trust the time they get from the OS isn’t stale, and doesn’t have built in inaccuracies.

* The whole system - OS and applications - can operate on the same time base without fear of internal drift or time going backwards. Knowing that data from different systems are tagged with the same time base is a huge benefit.

* Having faster access to time means that the application can spend fewer cycles accessing time, and more cycles doing real work. Or, it makes more cycles available to timestamp events that couldn’t be tagged before.

As a software client, TimeKeeper can get accurate time across your network, even where additional hardware is not an option.  It can serve PTP (version 1 or 2) or NTP (versions 1-4) or act as a client, to get accurate time to systems that need it.  We saw above that TimeKeeper solidly improves on how that time is driven all the way out to the application.

Is your application’s time data stale? Contact .(JavaScript must be enabled to view this email address) and we can quantify how TimeKeeper could improve your system.

Why to be paranoid about time

01/20/2011

FSM’s Matt Sherer asks:
How important is time to your application? To many people, not very much - as long as they can reasonably trust that their system time is accurate to a few seconds, that’s good enough.

For them, a stock NTP client and server is sufficient - more than sufficient, actually, if it can get accuracy within under a second, or even within a few milliseconds.

There are those of us, though, that obsess over time. We (and our applications) need to know that when we ask the operating system for time, the time we get is the actual time right now.  It can’t be 500 microseconds old, or even 100 microseconds old. 1 microsecond accuracy is getting a bit stale, actually. Sub-100 nanosecond accuracy is really what’s needed - whether it’s to satisfy regulatory requirements or to make sure that decisions being made are based on current reality and not the past.

Modern operating systems make it hard to get this kind of accuracy on their own.  If you need accurate time, make sure to ask yourself these questions, and be very sure of your answers. 

  1. Did my OS get delayed in getting the time? Did my hardware provide the time when the OS was off doing something else?
  2. How long did the OS take to process this time? Did it get hung up in the network stack? How long did that interrupt get deferred?
  3. Did the OS manipulate the given time to fit the various clocks it is trying to provide? For that matter, are all my applications getting data from the same clock under the hood?
  4. Did my application have to make a system call to get the time? How much time did I just spend while the OS did some housekeeping?
  5. How much variation in time is my application seeing depending on which processor it is on? Can time ‘go backwards’ if my application components get migrated between processors or while the OS is applying corrections to the time?
  6. If I’m getting variations in my application time by processor, is the OS getting the same variability? Can I trust that my OS timestamps are uniform?
  7. If I’m seeing these variations on a single host, what is happening across all of my machines? How can I correlate their data in any kind of meaningful way, if each machine’s time is wobbling, and there’s further wobble per processor?

Proper testing methods can answer many of these questions, but unfortunately there’s rarely time to perform good tests.  Given too much to do in too little time, it’s easy to think “We have PTP (or NTP) and that should solve our problem” or “We’ve got local timing hardware in the box, that means our application’s time is correct.”

The trouble is, even if you have PTP or NTP infrastructure, or if you have the space to put local timing hardware directly in the system, many of the above questions haven’t been answered. 

Remember, though, that some of us are obsessed with time, both its distribution and accuracy. This obsession with correct time is the reason FSMLabs’ TimeKeeper software exists - and it answers all of the above questions, while also reducing overhead. It gets more accurate time directly to the application, faster than was possible before.

TimeKeeper can act as a client or server for NTP or PTP. It can take in a GPS or CDMA feed and redistribute it over the network, taking advantage of the latest in hardware timestamping features found in common network hardware. Time delivery is a topic for another article, though. For now, let’s assume the time data that gets to your system is perfectly accurate, whether it was delivered locally via GPS, PTP, etc. How does TimeKeeper help alleviate all of those issues raised above?

Let’s step through the questions one by one.

  1. What was my OS doing when the time was delivered? How long was it sitting around before something was done with it? TimeKeeper digs down to the hardware so that when time is delivered, that time is handled immediately, whether it’s GPS data on a serial line or a PTP sync on the network. Time is not allowed to become stale.
  2. How long did it take to process the time data? Well, since TimeKeeper has a single focus, not very long. Every step possible is taken to get and act on time data as first priority, so that the system’s concept of time is always dead on.
  3. Did the OS manipulate the time? There are many ways of looking at time on a given system. Linux tries to provide the time via several clocks, each with different requirements. Not only does this add overhead, it also manipulates the time data provided. The system has been given very accurate time that should be used - so TimeKeeper intercepts all timing calls, from applications and the OS, so that everyone gets the same accurate and consistent time.
  4. Did my application have to make a system call to get the time? This is time consuming and expensive, and so TimeKeeper avoids that overhead.  Features like VDSO allow TimeKeeper to get accurate time to the application without leaving the core or calling into the OS, 25% faster than normal Linux VDSO usage, and in 15% of the time a normal Linux system call takes. This means you can spend fewer cycles waiting to get the time, and more cycles doing work.
  5. How much variation can occur between processors? As applications get time on different processors, there can be variations in time based on peculiarities of the processor. Thermal changes can cause one core to jump ahead slightly, or fall back. Time can go backwards, which makes for some dubious looking logs. TimeKeeper ensures that time always marches forwards, and that there is a uniform sense of time across the whole system.
  6. If applications are seeing variation in time per processor, what about the OS? If the time the OS provides varies, it’s true that internally the time it is using is inconsistent too.  Filesystem timestamps, network event logs, and anyother time sensitive code may have unwanted variability, depending on when time was delivered to the OS, and which processor that OS code is running on. Fortunately, TimeKeeper intercepts all operating system time functionality, so that the same correct time given to applications is also given to the OS. Your application logs will match perfectly with the OS timestamps.
  7. If there are many local time variations, what about different machines? If each local core is off by a few microseconds from each other, and the whole system time may be drifting behind by a millisecond, correlating that machine’s data with another system that might have had the same amount of drift forward is an impossible task.  TimeKeeper can both deliver the time to both machines, or if there is existing time distribution infrastructure, TimeKeeper can act as a controlling client on both systems, ensuring that both are always on the same time base.  With this assurance, the data from both systems can be trusted and aligned easily for accurate reporting.

These are important questions to answer - all are real dangers in time management that can affect system performance and reportability.  Testing can quantify how rampant these problems are in a given environment, if you’re given the time to do so. TimeKeeper can avoid the dangers and give your applications the correct time they can use to make the right decisions.

 

SEARCH

BLOG

PRESS RELEASES

WHITE PAPERS

 

SPOTLIGHT PRODUCT

email FSMLabs       512-263-5530