01/20/2011
FSM’s Matt Sherer asks:
How important is time to your application? To many people, not very much - as long as they can reasonably trust that their system time is accurate to a few seconds, that’s good enough.
For them, a stock NTP client and server is sufficient - more than sufficient, actually, if it can get accuracy within under a second, or even within a few milliseconds.
There are those of us, though, that obsess over time. We (and our applications) need to know that when we ask the operating system for time, the time we get is the actual time right now. It can’t be 500 microseconds old, or even 100 microseconds old. 1 microsecond accuracy is getting a bit stale, actually. Sub-100 nanosecond accuracy is really what’s needed - whether it’s to satisfy regulatory requirements or to make sure that decisions being made are based on current reality and not the past.
Modern operating systems make it hard to get this kind of accuracy on their own. If you need accurate time, make sure to ask yourself these questions, and be very sure of your answers.
- Did my OS get delayed in getting the time? Did my hardware provide the time when the OS was off doing something else?
- How long did the OS take to process this time? Did it get hung up in the network stack? How long did that interrupt get deferred?
- Did the OS manipulate the given time to fit the various clocks it is trying to provide? For that matter, are all my applications getting data from the same clock under the hood?
- Did my application have to make a system call to get the time? How much time did I just spend while the OS did some housekeeping?
- How much variation in time is my application seeing depending on which processor it is on? Can time ‘go backwards’ if my application components
get migrated between processors or while the OS is applying corrections to the time?
- If I’m getting variations in my application time by processor, is the OS getting the same variability? Can I trust that my OS timestamps are uniform?
- If I’m seeing these variations on a single host, what is happening across all of my machines? How can I correlate their data in any kind of meaningful way, if each machine’s time is wobbling, and there’s further wobble per processor?
Proper testing methods can answer many of these questions, but unfortunately there’s rarely time to perform good tests. Given too much to do in too little time, it’s easy to think “We have PTP (or NTP) and that should solve our problem” or “We’ve got local timing hardware in the box, that means our application’s time is correct.”
The trouble is, even if you have PTP or NTP infrastructure, or if you have the space to put local timing hardware directly in the system, many of the above questions haven’t been answered.
Remember, though, that some of us are obsessed with time, both its distribution and accuracy. This obsession with correct time is the reason FSMLabs’ TimeKeeper software exists - and it answers all of the above questions, while also reducing overhead. It gets more accurate time directly to the application, faster than was possible before.
TimeKeeper can act as a client or server for NTP or PTP. It can take in a GPS or CDMA feed and redistribute it over the network, taking advantage of the latest in hardware timestamping features found in common network hardware. Time delivery is a topic for another article, though. For now, let’s assume the time data that gets to your system is perfectly accurate, whether it was delivered locally via GPS, PTP, etc. How does TimeKeeper help alleviate all of those issues raised above?
Let’s step through the questions one by one.
- What was my OS doing when the time was delivered? How long was it sitting around before something was done with it? TimeKeeper digs down to the hardware so that when time is delivered, that time is handled immediately, whether it’s GPS data on a serial line or a PTP sync on the network. Time is not allowed to become stale.
- How long did it take to process the time data? Well, since TimeKeeper has a single focus, not very long. Every step possible is taken to get and act on time data as first priority, so that the system’s concept of time is always dead on.
- Did the OS manipulate the time? There are many ways of looking at time on a given system. Linux tries to provide the time via several clocks, each with different requirements. Not only does this add overhead, it also manipulates the time data provided. The system has been given very accurate time that should be used - so TimeKeeper intercepts all timing calls, from applications
and the OS, so that everyone gets the same accurate and consistent time.
- Did my application have to make a system call to get the time? This is time consuming and expensive, and so TimeKeeper avoids that overhead. Features like
VDSO allow TimeKeeper to get accurate time to the application without leaving the core or calling into the OS, 25% faster than normal Linux VDSO usage, and
in 15% of the time a normal Linux system call takes. This means you can spend fewer cycles waiting to get the time, and more cycles doing work.
- How much variation can occur between processors? As applications get time on different processors, there can be variations in time based on peculiarities of the processor. Thermal changes can cause one core to jump ahead slightly, or fall back. Time can go backwards, which makes for some dubious looking logs. TimeKeeper ensures that time always marches forwards, and that there is a uniform sense of time across the whole system.
- If applications are seeing variation in time per processor, what about the OS? If the time the OS provides varies, it’s true that internally the time it is using is inconsistent too. Filesystem timestamps, network event logs, and anyother time sensitive code may have unwanted variability, depending on when time was delivered to the OS, and which processor that OS code is running on.
Fortunately, TimeKeeper intercepts all operating system time functionality, so that the same correct time given to applications is also given to the OS. Your application logs will match perfectly with the OS timestamps.
- If there are many local time variations, what about different machines? If each local core is off by a few microseconds from each other, and the whole system time may be drifting behind by a millisecond, correlating that machine’s data with another system that might have had the same amount of drift forward is an impossible task. TimeKeeper can both deliver the time to both machines, or if there is existing time distribution infrastructure, TimeKeeper can act as a controlling client on both systems, ensuring that both are always on the same time base. With this assurance, the data from both systems can be trusted and aligned easily for accurate reporting.
These are important questions to answer - all are real dangers in time management that can affect system performance and reportability. Testing can quantify how rampant these problems are in a given environment, if you’re given the time to do so. TimeKeeper can avoid the dangers and give your applications the correct time they can use to make the right decisions.