This weekend an aspiring embedded developer asked something in a programming group about how to handle time between his devices and the server. This is something I've got some experience with, so my answer was both passionate and empathic.
This guide is for any developer who needs to write code that sends timestamped data, logs, chat messages, or anything else in that vein. Especially if you end up distributing software across different computers. It may also be of help to those working on large, monolithic enterprise systems, see the battle stories at the end.
So, here it is again, from a developer who's been there.
First, check out the Computerphile video about time. This should give you enough background to know where things risk going wrong, and then comes a summary of what you should be doing.
The paf short list of DO's
- Only use continuous, monotonic time
- Use UTC Everywhere
- At presentation, convert to desired timezone
- Synchronise all clocks using NTP
Only use continuous, monotonic time
This means counting seconds or milliseconds against a known place in time. This can be the year 2000, the year 1900, or most commonly, The Unix Epoch. You pick a storage method, and precision, and count upwards from there.
Standard disclaimer, make sure you do not use a simple 32bit integer counter, or you will run out of time in the future.
Use UTC Everywhere
You will need conversion methods from your counter, to UTC, and back again. Don't forget to take Leap Seconds into consideration, or you will have clock drift or gaps where you did not expect it.
Conversion at Presentation
Since now you have solid, working UTC time, that while it might be completely off (since you didn't have a proper clock), at least can be correct. When you convert it to display, you need to take Historical timezone changes into consideration.
Synchronise all clocks with NTP
NTP is one of few proper ways of synchronising clocks. Use ntpd or similar daemon. This will ensure your clock always goes forwards, and prevent time drifts into the past. Time leaps (hard synchronisation) may break #1 above, and jumble your data.
Further Considerations Regarding Time Zones
Timezone databases can expire. Do not at any cost keep a local timezone database in your application that differs from the system one, make sure the system can update it's timezone database. Timezones sometimes change with only days or weeks notice, and this may cause further trouble if you pass data in local time formats.
If you pass data in localised timezone between machines, you need to make sure that your timezone databases are in sync between the two machines, or you will see funny disconnects and lost time. Gaps in data, or duplicate/out of order data, even on NTP synchronised machines. Debugging this can be an exercise in brain damage.
This means, that all your servers should be running in UTC, at all times.
There is no way to automatically detect the proper timezone for an embedded system. GPS coordinates won't do it, the network time on most GSM/3G networks is most often wrong, or out of sync, and the DHCP Timezone Option will break in practice. Attempting to do this for anything other than presentation purposes will lead to corrupt data in the long run.
For systems like this, it may also be required to pass the suspected current timezone in with the data from an embedded device. Especially if the device is mobile (car, boat, cellphone) which may migrate from timezone to timezone. This means that you report data as an offset to UTC, but also pass along the meta data of where the system thinks it is.
Further Considerations Regarding Clocks
Most embedded devices have very poor clocks, which may drift with a fair bit
(Seconds) between minutes even. They will still be monotonic (always going
upwards) but the time between two seconds won't always be two seconds.
This seldom causes problems on the device itself, however when reporting data from one of these devices to a server, and then plotting them, your data will have less precision than you expect.
Some devices will have GPS attached, do keep in mind that this can give exact time, that will not be UTC. GPS time and UTC time drifts, slowly, more and more apart, and this has to be kept in mind when using it for synchronisation.
Further Considerations Regarding Precision
When there aren't stable clocks or NTP to synchronise the time with, you want to discard some precision, because the clock won't be "good" anyhow. You can do this by truncating off anything to 5/10/60-second intervals.
If you then have data where the order matters, you add a separate counter, or ordering. Do not trust unreliable clocks as a source of order of events between systems. This will fail in more than one way.
It may feel wrong to discard data on a device, but the goal there is to prevent other layers from assuming that the precision given is correct. If timestamps are handled in with nano/milli/second -precision, front end developers, analysts, business intelligence systems and other layers that you may want to add in the future, will be wrong, and it will cause pain.
If you cannot use NTP for various reasons (bandwidth, network access, or similar) but end up doing your own clock-jumping or smearing, you may need to use a precision calculation in order to discard an appropriate amount of data.
Remember that the OS and Hardware will lie about it's precision to you. Because the OS will assume that the hardware is correct, and the hardware will claim to have precision it cannot deliver.
These are some of the, somewhat hard earned, advice I have about time, and how to (not) do things. Some of the things I've seen in the past have been:
- Discovering that an application has a different timezone database from the operating system
- Discovering that two servers have different timezone databases
- Two servers with different timezone database, in the same timezone, causing conversion failures
- Attempting to correlate data between multiple embedded systems and finding out precision woes
- Attempting to correlate data between systems, and finding different clock drifts between them
- Finding clocks jumping backwards in time, the same second repeating three times that minute
- Losing chronological order between events because sorted by time, not event number
And with this document I hope to save someone else from encountering these issues in the future.
By: D.S. Ljungmark