22 May 2022

Date-time values and formats

Date-time values are atomic

… and must always be stored and transmitted as such.

There are three components to every date-time (a.k.a. timestamp) value:

In order to accurately record and represent a moment in time, you need all three. “5 May 2020, 11:00 AM”, “11:00 AM London”, nor “5 May 2020” tell you precisely when something happened. Likewise, the announcement “All of Facebook will shut down for maintenance on 5 May 2020 at 11:00 AM”, will lead users wondering why Facebook isn’t working for them when it isn’t 11 am where they are.

Date-time fields have an authority

Answering the question “Who or what owns the value?” largely determines what the authority is.

The fields in which date-time values are stored and transmitted must be set up in a way that preserves the authority of the value unambiguously.

Subjective / personal authority

Date-time values that have a subjective, or personal authority, are ones where the date-time value is owned / created by an end user. This ownership may be explicit (e.g., when adding an event to a calendaring application), or implicit (e.g., when tracking a workout), but the date-time value is always from the perspective of the end user.

It’s important for our systems to honor what our end users say, and not try to re-interpret things for our own sake. Preservation of data fidelity and information is of the utmost importance.

Personal date-time values must capture the end user’s local time zone, and this time-zone must be preserved as-is - at rest and in transit.

Why not just use UTC?

Converting a date-time value to UTC loses the source time zone (i.e., the “where”) information, and makes it impossible to convert back to the original time zone. Personal events by nature have a local reference - information which is important to maintain.

Using UTC alone means:

  • It is impossible to display accurate “wall time” of events when viewed in different time zones from where the events occurred.
  • The “wall calendar” date will be wrong for events that occur near midnight. E.g., any event between 12am and 4am in NY will have a different date in UTC from the local date.

Although you could store the time zone offset separately from the UTC value, this can lead to ambiguity if the two are ever separated.

Storing local time is also what the user said to do.

Past events

When talking about a system handling personal date-time values, it is important to consider if the system is mostly for recording when an event happened, or for scheduling future events.

With past events, the time zone portion of a date-time value should be the time zone offset. This is an offset from UTC (Universal Coordinated Time) - typically shown as something like “+01:00” - and provides a 100% accurate representation of when an event in the past has happened.

Why time zone offset and not an identifier?

Using a local time with a time zone identifier (e.g., “Europe/Paris”) on it’s own is ambiguous during a Daylight Saving “fall back” period.

For example, in New York on November 7th, 2021 at 2 AM the clocks were turned back to 1AM. This means that there are two versions of all local time values from 1 AM to 2 AM. If all you know is a time (e.g., 1:30 AM) and zone identifier (e.g., America/New_York), it is impossible to answer “which 1:30 am?”

This is ambiguous:

  • “2021-11-07T01:30:00.000 America/New_York”.

These are not:

  • “2021-11-07T01:30:00.000-0400” (the first 1:30am)
  • “2021-11-07T01:30:00.000-0500” (the second 1:30am)

Future events

With future events, you will use the time zone identifier (e.g., “America/New_York”). The reason for this is because time zones do actually change sometimes - boundaries may be redrawn, and daylight savings rules may change. Storing the zone identifier allows you to present the correct wall-clock time (i.e., what the user will see when looking at a physical clock on their wall the moment the event starts) for an event scheduled in advance - even (and especially) when a government decides to change their time zone(s).

Note that the zone offset and zone identifier are both not 100% accurate, but the zone identifier is the least ambiguous, and can help you protect your data against future changes to time zone rules.

What about ambiguous times during a DST change?

We can either make an assumption (e.g., use the first instance of a “fall back” time), or ask the owner of the event to provide clarity while creating the event.

The RFC 5545 has some guidance around what to do when encountering these.

What about using both time zone offset and identifier?

It might be worth it to store both the time zone offset and the time zone identifier if there’s a risk that the system creating the events does not have an updated set of Daylight Saving rules, and needs to correct a mistaken offset.

While an event is still in the future, you would lean on the identifier, and once the event is in the past you would strictly use the offset. When updating the time zone configutration data, regenerate the offset-based date-time values from the identifier-based ones.

Objective / absolute authority

Date-time values that have objective, or absolute authority are ones that have a fixed reference point, such that the event happens at the same moment for everyone in the world, regardless of what their local time is.

For example, scheduling a server shutdown has a fixed point of reference - the server - and will happen at the same moment for end users around the world.

Absolute date-time values stored in a system must all be converted into a single time zone. Even though all stored values will have the same time zone value, it is advised to maintain that component.

Many people express absolute date-time values in UTC (Coordinated Universal Time), because it is convenient, and has special consideration in the standards - but note that that is not required. UTC is just another time zone. If it makes sense for you to express your absolute date-time values in somethng other than UTC, then go for it.

Date-time value storage

Choose a system that guarantees a faithful preservation of your date-time values - especially with regards to the time zone. Values must remain the same, even if you move your system from New York to Sydney, or the time zone of the operating system that hosts your system changes from UTC to Los Angeles.

In SQL databases, there is a TIMESTAMP WITH TIME ZONEtype that will do this. But, make sure the underlying database provider actually honors it. PostgreSQL fully supports this type (and has a timestampz shorthand data type). MySQL does not (at the time of writing this) support this data type. SQLite has a very limited set of primitive storage types - date-time values are stored as TEXT.

Can date, time and/or time zone offset be stored separately?

No.

The date, time and time zone offset values are parts of an atomic value, and shouldn’t be stored separately. The time value can not be reliably handled on it’s own, since certain time values do not exist in certain date + time zone combinations (e.g., 2:30am on March 13th, 2022 in New York does not exist). Therefore, it’s best to store the date + time + time zone offset as a single atomic value.

If there are performance reasons for splitting a date-time value (e.g., for collating events for a user based on the day they happened - “all the exercises I did per day”), and it is unreasonable to rely on the database system to provide these values on the fly, then it is recommended to create extra fields in addition to the atomic date-time value. Consider naming these derived fields to indicate as much - e.g., “derived_local_date”.

This is creeping into the realm of a premature optimization, so start with a single atomic value, and then build on top of that as performance concerns arise.

Date-time value formats

When transmitting date-time values in machine-readable format, pick one format and use it everywhere. Well, pick as few as humanly possible. These are the recommended formats to use for various circumstances - all are ISO 8601-based.

A litte bit about ISO 8601

It’s important to note that the ISO 8601 standard is a full body of formats - it’s ambiguous to just say “use ISO 8601”. Formats are generally in one of two flavors: extended or compact. Extended formats contain component separators (“2022-01-02”), whereas compact formats do not (“20220201”). Extended and compact can be applied to various parts of an overall date-time format string - it is valid to use a compact date, extended time, and compact time zone offset in a single ISO 8601 formatted string.

More ISO 8601 details

These are all valid ISO 8601 values:

  • 2020-04-03T17:23:56.046-0400 - local date, time with milliseconds, compact tz offset
  • 2020-04-03T17:23:56.046-04:00 - local date, time with milliseconds, exteded tz offset
  • 2020-04-03T17:23:56-0400 - local date, time without milliseconds, compact tz offset
  • 2020-04-03T17:23:56-04:00 - local date, time without milliseconds, exteded tz offset
  • 2020-04-03T17:23:56.046Z - UTC date, time with milliseconds
  • 2020-04-03T17:23:56Z - UTC date, time without milliseconds
  • 20200329T172356 - compact date and time, without a time zone
  • 20200329 - compact date
  • 2020-03-29 - extended date

For more info, see:

Naming

I show the format names I like to use below. These might be completely unacceptable to you - that’s ok. Whatever you end up naming your formats, pick your names at the outset, and use those names everywhere - in documents, schemas, and conversations. Consistency is key to adoption.

For pretty much everything

… use ISO 8601 extended date + extended time (with milliseconds) + extended tz offset - also known as RFC 3339

These date-time strings look like this:

Note that a date-time in UTC uses the offset “+00:00”, rather than a common shorthand of “Z”.

For future, personal events

… use ISO 8601 extended date + extended time (no milliseconds) + tz identifier

These date-time value strings look like this:

For events strictly in UTC

… use ISO 8601 extended date + extended time (with milliseconds) + a literal Z - for example:

The Z on the end is shorthand for UTC, and is always present.

Platform documentation


References, further reading