We’ve seen some ‘interesting’ problems in deployed timing infrastructures. PTP Grandmasters that stop providing time on the network. Timing clients unable to determine that their timing source was completely wrong. Clients that are unable to fall back to another server, let alone another protocol. There really is a long list.
Failings of timing infrastructure are common, but understood. They can be avoided with some forethought, the right tools, and the right methods. Here are some quick examples of factors to keep in mind:
- Planned client failover scenarios in case of a failure, whether its a single appliance or a systemic failure. How do you fall back between servers? What protocols to you fall back to? How do you fail over between network paths?
- Pessimistic testing. Never trust self reported sync accuracy numbers. Ever. Validate everything.
- Hardware selection - from network type (10Gb? 40Gb? Infiniband?) to timestamping technology (like onboard hardware timestamping network cards), make sure you’re selecting the right mix for your needs.
- Choose the right protocols and paths for the network - PTP and NTP can both provide timing accuracy better than 1 microsecond, even though many vendors will say otherwise in order to sell you newer hardware.
- Resilience in infrastructure is key - cross-validating grandmasters help provide a constant service to clients and test each other against component failure. Likewise, clients should be able to detect and reject invalid sources.
- Boundary clocks and stratum servers can push high quality time throughout the network, without having to add even more systems to your deployment.
In this paper, we cover these topics and more, to help guide users in building up a modern timing infrastructure that is accurate, verifiable, and resilient.