22 May 2022
Date-time values and formats
Date-time values are atomic
… and must always be stored and transmitted as such.
There are three components to every date-time (a.k.a. timestamp) value:
- Date - the day on which an event will/did occur
- Time - the time of day of the event
- Time zone - the locality of the event
In order to accurately record and represent a moment in time, you need all three. “5 May 2020, 11:00 AM”, “11:00 AM London”, nor “5 May 2020” tell you precisely when something happened. Likewise, the announcement “All of Facebook will shut down for maintenance on 5 May 2020 at 11:00 AM”, will lead users wondering why Facebook isn’t working for them when it isn’t 11 am where they are.
Date-time fields have an authority
Answering the question “Who or what owns the value?” largely determines what the authority is.
The fields in which date-time values are stored and transmitted must be set up in a way that preserves the authority of the value unambiguously.
Subjective / personal authority
Date-time values that have a subjective, or personal authority, are ones where the date-time value is owned / created by an end user. This ownership may be explicit (e.g., when adding an event to a calendaring application), or implicit (e.g., when tracking a workout), but the date-time value is always from the perspective of the end user.
It’s important for our systems to honor what our end users say, and not try to re-interpret things for our own sake. Preservation of data fidelity and information is of the utmost importance.
Personal date-time values must capture the end user’s local time zone, and this time-zone must be preserved as-is - at rest and in transit.
Why not just use UTC?
Converting a date-time value to UTC loses the source time zone (i.e., the “where”) information, and makes it impossible to convert back to the original time zone. Personal events by nature have a local reference - information which is important to maintain.
Using UTC alone means:
- It is impossible to display accurate “wall time” of events when viewed in different time zones from where the events occurred.
- The “wall calendar” date will be wrong for events that occur near midnight. E.g., any event between 12am and 4am in NY will have a different date in UTC from the local date.
Although you could store the time zone offset separately from the UTC value, this can lead to ambiguity if the two are ever separated.
Storing local time is also what the user said to do.
Past events
When talking about a system handling personal date-time values, it is important to consider if the system is mostly for recording when an event happened, or for scheduling future events.
With past events, the time zone portion of a date-time value should be the time zone offset. This is an offset from UTC (Universal Coordinated Time) - typically shown as something like “+01:00” - and provides a 100% accurate representation of when an event in the past has happened.
Why time zone offset and not an identifier?
Using a local time with a time zone identifier (e.g., “Europe/Paris”) on it’s own is ambiguous during a Daylight Saving “fall back” period.
For example, in New York on November 7th, 2021 at 2 AM the clocks were turned back to 1AM. This means that there are two versions of all local time values from 1 AM to 2 AM. If all you know is a time (e.g., 1:30 AM) and zone identifier (e.g., America/New_York), it is impossible to answer “which 1:30 am?”
This is ambiguous:
- “2021-11-07T01:30:00.000 America/New_York”.
These are not:
- “2021-11-07T01:30:00.000-0400” (the first 1:30am)
- “2021-11-07T01:30:00.000-0500” (the second 1:30am)
Future events
With future events, you will use the time zone identifier (e.g., “America/New_York”). The reason for this is because time zones do actually change sometimes - boundaries may be redrawn, and daylight savings rules may change. Storing the zone identifier allows you to present the correct wall-clock time (i.e., what the user will see when looking at a physical clock on their wall the moment the event starts) for an event scheduled in advance - even (and especially) when a government decides to change their time zone(s).
Note that the zone offset and zone identifier are both not 100% accurate, but the zone identifier is the least ambiguous, and can help you protect your data against future changes to time zone rules.
What about ambiguous times during a DST change?
We can either make an assumption (e.g., use the first instance of a “fall back” time), or ask the owner of the event to provide clarity while creating the event.
The RFC 5545 has some guidance around what to do when encountering these.
What about using both time zone offset and identifier?
It might be worth it to store both the time zone offset and the time zone identifier if there’s a risk that the system creating the events does not have an updated set of Daylight Saving rules, and needs to correct a mistaken offset.
While an event is still in the future, you would lean on the identifier, and once the event is in the past you would strictly use the offset. When updating the time zone configutration data, regenerate the offset-based date-time values from the identifier-based ones.
Objective / absolute authority
Date-time values that have objective, or absolute authority are ones that have a fixed reference point, such that the event happens at the same moment for everyone in the world, regardless of what their local time is.
For example, scheduling a server shutdown has a fixed point of reference - the server - and will happen at the same moment for end users around the world.
Absolute date-time values stored in a system must all be converted into a single time zone. Even though all stored values will have the same time zone value, it is advised to maintain that component.
Many people express absolute date-time values in UTC (Coordinated Universal Time), because it is convenient, and has special consideration in the standards - but note that that is not required. UTC is just another time zone. If it makes sense for you to express your absolute date-time values in somethng other than UTC, then go for it.
Date-time value storage
Choose a system that guarantees a faithful preservation of your date-time values - especially with regards to the time zone. Values must remain the same, even if you move your system from New York to Sydney, or the time zone of the operating system that hosts your system changes from UTC to Los Angeles.
In SQL databases, there is a TIMESTAMP WITH TIME ZONE
type that will do this. But, make sure the underlying database provider actually honors it. PostgreSQL fully supports this type (and has a timestampz
shorthand data type). MySQL does not (at the time of writing this) support this data type. SQLite has a very limited set of primitive storage types - date-time values are stored as TEXT
.
Can date, time and/or time zone offset be stored separately?
No.
The date, time and time zone offset values are parts of an atomic value, and shouldn’t be stored separately. The time value can not be reliably handled on it’s own, since certain time values do not exist in certain date + time zone combinations (e.g., 2:30am on March 13th, 2022 in New York does not exist). Therefore, it’s best to store the date + time + time zone offset as a single atomic value.
If there are performance reasons for splitting a date-time value (e.g., for collating events for a user based on the day they happened - “all the exercises I did per day”), and it is unreasonable to rely on the database system to provide these values on the fly, then it is recommended to create extra fields in addition to the atomic date-time value. Consider naming these derived fields to indicate as much - e.g., “derived_local_date”.
This is creeping into the realm of a premature optimization, so start with a single atomic value, and then build on top of that as performance concerns arise.
Date-time value formats
When transmitting date-time values in machine-readable format, pick one format and use it everywhere. Well, pick as few as humanly possible. These are the recommended formats to use for various circumstances - all are ISO 8601-based.
A litte bit about ISO 8601
It’s important to note that the ISO 8601 standard is a full body of formats - it’s ambiguous to just say “use ISO 8601”. Formats are generally in one of two flavors: extended or compact. Extended formats contain component separators (“2022-01-02”), whereas compact formats do not (“20220201”). Extended and compact can be applied to various parts of an overall date-time format string - it is valid to use a compact date, extended time, and compact time zone offset in a single ISO 8601 formatted string.
More ISO 8601 details
These are all valid ISO 8601 values:
2020-04-03T17:23:56.046-0400
- local date, time with milliseconds, compact tz offset2020-04-03T17:23:56.046-04:00
- local date, time with milliseconds, exteded tz offset2020-04-03T17:23:56-0400
- local date, time without milliseconds, compact tz offset2020-04-03T17:23:56-04:00
- local date, time without milliseconds, exteded tz offset2020-04-03T17:23:56.046Z
- UTC date, time with milliseconds2020-04-03T17:23:56Z
- UTC date, time without milliseconds20200329T172356
- compact date and time, without a time zone20200329
- compact date2020-03-29
- extended date
For more info, see:
Naming
I show the format names I like to use below. These might be completely unacceptable to you - that’s ok. Whatever you end up naming your formats, pick your names at the outset, and use those names everywhere - in documents, schemas, and conversations. Consistency is key to adoption.
For pretty much everything
… use ISO 8601 extended date + extended time (with milliseconds) + extended tz offset - also known as RFC 3339
These date-time strings look like this:
- “2020-03-15T08:15:00.000-03:00“
- “2020-03-15T08:15:00.000+00:00“
Note that a date-time in UTC uses the offset “+00:00”, rather than a common shorthand of “Z”.
For future, personal events
… use ISO 8601 extended date + extended time (no milliseconds) + tz identifier
These date-time value strings look like this:
- “2032-04-02T11:30:00 America/Los_Angeles“
- “2032-04-02T11:30:00 Europe/London“
For events strictly in UTC
… use ISO 8601 extended date + extended time (with milliseconds) + a literal Z
- for example:
- “2020-03-15T08:15:00.000Z“
The Z
on the end is shorthand for UTC, and is always present.
Platform documentation
- Kotlin: DateTimeFormatter patterns
- Swift: Unicode date format patterns
- JavaScript: Luxon formatting (don’t use moment.js)
References, further reading
- How to save datetimes for future events - (when UTC is not the right answer) (Lau Taarnskov, 2015)
Instead of saving the time in UTC along with the time zone, developers can save what the user expects us to save: the wall time. Ie. what the clock on the wall will say. In the example that would be 10:00. And we also save the timezone (Santiago/Chile). This way we can convert back to UTC or any other timezone.You could argue that converting to UTC and saving UTC you will never have an ambiguous or non-existing datetime. That is true, but you have to ask yourself whether it is better to tell the user something incorrect or tell the user the truth. If you want your software to provide the most correct answers, this method is superior to just saving UTC and a timezone.
- Daylight saving time and time zone best practices (Stack Overflow, ongoing)
Great collection of dos, don’ts, recommendations, errata, and even more references. - Storing UTC is not a silver bullet (Jon Skeet, 2019)
Great in-depth discussion around storing future date/time values from the author of NodaTime. Mentions the idea of storing both the tz offset and zone id.Option 3: preserve local time, using UTC as derived data to be recomputed Spoiler alert: this is my preferred option.…Principle of preserving supplied data: If you’re going to need the original information anyway, why not just store that?
- Why “Always use UTC” is bad advice (Q42 Engineering, 2019)
Well written discussion and guidance from implementing parcel tracking at the Dutch Postal Services. Mentions the idea of storing tz offset and zone id, like “2014-02-07T20:14:00+04:00 Europe/Moscow” for future events. - Always Use UTC Dates And Times (Kyle K, 2019)
Focused on PHP, the take is a little simplistic, but including here because it offers a counter-point. - The Proper Way to Handle Multiple Time Zones in MySQL (Vertabelo, 2016)
In addition to identifying strategies for MySQL, this blog post has a nice summary of the concerns involved, including the important notion of asking (and answering) the tough questions before designing a system. - Time Zones Aren’t Offsets – Offsets Aren’t Time Zones (Atomic Object, 2016)
- What is GMT? (greenwichmeantime.com)
Greenwich Mean Time, abbreviated as GMT, is nowadays a time zone designation rather than a time standard.GMT was replaced as the international time standard in 1972 by UTC (Universal Coordinated Time). UTC is based on atomic measurements.
- Your Calendrical Fallacy Is thinking… (Dave DeLong)
Common misconceptions about dates, times, time zones, etc. Also interesting to read:- New items to add to the above site
- More Falsehoods programmers believe about time
- How to: Resolve ambiguous times (Microsoft, 2021)
TL;DR: Make an assumption, or ask the user - RFC 5545 - internet calendar (IETF, 2009)
Industry wide calendrical specifications. Of special interest is how to handle ambiguous or invalid times for recurring events.If, based on the definition of the referenced time zone, the local time described occurs more than once (when changing from daylight to standard time), the DATE-TIME value refers to the first occurrence of the referenced time. Thus, TZID=America/ New_York:20071104T013000 indicates November 4, 2007 at 1:30 A.M. EDT (UTC-04:00). If the local time described does not occur (when changing from standard to daylight time), the DATE-TIME value is interpreted using the UTC offset before the gap in local times. Thus, TZID=America/New_York:20070311T023000 indicates March 11, 2007 at 3:30 A.M. EDT (UTC-04:00), one hour after 1:30 A.M. EST (UTC-05:00).