Schema Time Changes (Draft)

M. Davis

Last updated: 2000-12-11

This is a set of notes on the topic of time instants and durations, centered around the Schema and ISO 8601 discussions. There were two goals: make sure that there was no loss of data in the representation of dates and time, and make sure that the date and time datatypes were well-defined (which they were not, in the previous version).

The first goal is to represent the range of dateTimes that can be represented in major commercial systems. Otherwise there will be data corruption when encapsulating information in XML. This means representing a range of approximately 300,000,000,000 years, based on the maximum values in the following:

The simplest and most portable representation for an instant in time would be in seconds from a fixed start date. The Schema group is using an ISO 8601, Gregorian based representation. Although not optimal for internationalization and efficiency, this does work for time instants if the calendar is defined to be a  proleptic Gregorian calendar, e.g. one that is defined into the indefinite past and future, even before the calendar was adopted. This defines calendar years before year 1 differently than the AD/BC calendar system: year -2, year -1, year 0, year 1, year 2, ...; and leap years are defined into the past on that basis. So year 0 is a leap year, as is year -4, -8, etc. Year -100 is not a leap year. This is the same system used in the FDIS 8601.

The second goal is to be well-defined. Given an instant in time you can determine precisely which other instants are greater than, equal to, or less than that instant. Given a time period, you can determine which time instants are within that time period. Since the Schema group also wants durations patterned after ISO 8601, those have to be made well-defined (they are not in 8601). Thus a precise specification of the ordering of durations and time instants is required, and for the addition of dateTimes and durations. That also makes timePeriods (that is, the <dateTime, duration> pair) well defined.

The organization of these notes is a bit haphazard, since it reflects different topics that have come up. For the symbols used here, see Notation.

1. Comparing Durations

[Ed note: either add this to the durations section or as an appendix.]

Durations are only partially ordered, since there is no determinate relationship between certain durations such as one month (P1M) versus 30 days (P30D). The ordering between two durations D and C is defined as as follows, based on the addition operation over dateTime and durations as defined in Appendix X. This is a logical explanation of the process. Actual implementations are free to optimize as long as they produce the same results.

From this basic definition, all the other comparison relations can be derived, as in Comparison Relations. This includes the incomparable relation, D C.

The following table shows the strongest relationship that can be determined between example durations: Note that because of leap-seconds, a seconds field can vary from 59 to 60. However, because of the way that addition is defined in Appendix X, these are still totally ordered.

  Relation
P1Y > P364D P365D   P366D < P367D
P1M > P27D P28D P29D P30D P31D < P32D
P5M > P149D P150D P151D P152D P153D < P154D

Implementations are free to optimize the computation of the ordering relationship. For example, the following table can be used to compare durations of a small number of months against days.

  Months 1 2 3 4 5 6 7 8 9 10 11 12 13 ...
Days Minimum 28 59 89 120 150 181 212 242 273 303 334 365 393 ...
Maximum 31 62 92 123 153 184 215 245 276 306 337 366 397 ...

Total order durations

Certain derived datatypes of durations can be guaranteed have a total order. To do so, they cannot have fields from more than one column in the table below.

A B
year,
month
day,
hour,
minute,
second

For example, a monthDuration datatype could be defined that specified the month, but required all other fields to be unspecified. These monthDurations would have a total order.

If you are using partially ordered datatypes, see Partial Order Caution, and Boundaries.

2. Comparing DateTimes

[Ed note: either add this to the dateTime section or as an appendix.]

DateTimes are only partially ordered, since there is no determinate relationship between certain dateTimes. For example, there is no determinate ordering between (a) 2000-03 and (b) 2000-03-05, or between (c) 2000-01-20T12:00:00 and (d) 2000-01-20T12:00:00Z. Note that based on timezones currently in use, (c) could vary from 2000-01-20T12:00:00+12 to 2000-01-20T12:00:00-13. It is, however, possible for this range to expand or contract in the future, based on local laws. Because of this, the following definition uses a somewhat broader range of indeterminate values: +14..-14.

The following definition uses the notation S[year] to represent the year field of S, S[month] to represent the month field, and so on. The notation (S & "-14") means adding the timezone -14 to S, where S did not already have a timezone. This is a logical explanation of the process. Actual implementations are free to optimize as long as they produce the same results. From basic relations provided below, all the other comparison relations can be derived, as in Comparison Relations. This includes the incomparable relation: D C.

The ordering between two dateTimes P and Q is defined by the following process:

A. Normalize P and Q. That is, if there is a timezone present, but it is not Z, convert it to Z using the addition operation defined in Appendix X.

B. If P and Q either both have a timezone, or both don't have a timezone, compare P and Q field by field from the year field down to the second field, and return a result as soon as it can be determined. That is:

  1. For each i in {year, month, day, hour, minute, second}
    1. If P[i] and Q[i] are both not specified, continue to the next i
    2. If P[i] is not specified and Q[i] is, or vice versa, stop and return P Q
    3. If P[i] < Q[i], stop and return P < Q
    4. If P[i] > Q[i], stop and return P > Q
  2. If the steps in #1 did not terminate, stop and return P = Q

C. Otherwise, if P contains a timezone and Q does not, then compare them on this basis:

  1. if P (Q & "-14") then P Q
  2. else if P (Q & "+14") then P Q
  3. else P Q
D. Otherwise, P does not contain a timezone and Q does, so compare them on this basis:
  1. if (P & "+14") Q then P Q
  2. else (P & "-14") Q then P Q
  3. else P Q

Examples:

Determinate Indeterminate
2000-01 < 2000-02-15 2000-01-15T12 2000
2000-01-15= 2000-02-15 2000-01-01T12:00:00 1999-12-31T23:00:00Z
2000-01-15T12:00:00 < 2000-01-16T12:00:00Z 2000-01-16T12:00:00 2000-01-16T12:00:00Z
  2000-01-15 2000-01-16T12:00:00Z

Total order dateTimes

Certain derived types from dateTimes can be guaranteed have a total order. To do so, they must require that a specific set of fields are always specified, and that remaining fields (if any) are always unspecified. For example, the date datatype is defined to contain exactly year, month, and day. Thus dates have a total order among themselves.

Note: The date datatype doesn't include a timezone. For one thing, the intuitive notion of a date does not involve timezone, any more than the intuitive notion of a year does. For another, SQL dates do not support time zone, so there is a precedent, and this will make compatability with SQL easier. [from Fred Zemke]

If you are using partially ordered datatypes, see Partial Order Caution, and Boundaries.

3. Appendix X. Addition of dateTime and duration

Given a dateTime S and a duration D, this appendix specifies how to compute an dateTime E. Such computations are used, for example, to determine when a dateTime is within a duration. This is a logical explanation of the process. Actual implementations are free to optimize as long as they produce the same results. The calculation uses the notation S[year] to represent the year field of S, S[month] to represent the month field, and so on. It also depends on the following functions:

Note: yearValue and monthValue usually are some X[year] and X[month], but sometimes they are computed. So the M and Y computations account for over- and underflow.

Using these functions, compute E from S and D as follows.

Essentially, this calculation is equivalent to separating D into <year,month> and <day,hour,minute,second> fields. The <year,month> is added to S. If the day is out of range, it is pinned to be within range. Thus April 31 turns into April 30. Then the <day,hour,minute,second> is added. This latter addition can cause the year and month to change.

Note: Leap seconds are handled by the computation by treating them as overflows. Essentially, a value of 60 seconds in S is treated as if it were a duration of 60 seconds added to S (with a zero seconds field). All calculations thereafter use 60 seconds per minute. (This note simply explains the rationale; the actualy results are given by following the algorithm.)

Thus the addition of either PT1M or PT60S to any dateTime will always produce the same result. This is a special definition of addition which is designed to match common practice, and -- most importantly -- be stable over time.

A definition that attempted to take leap-seconds into account would need to be constantly updated, and could not predict the results of future implementation's additions. The decision to introduce a leap second in UTC is the responsibility of the International Earth Rotation Service (IERS). They make periodic announcements as to when leap seconds are to be added, but this is not known more than a year in advance. For more information on leap seconds, see U.S. Naval Observatory Time Service Department.

Because of this, the addition of seconds using durations is a specially-defined operation which is not guaranteed to match the real-life addition of seconds with UTC.

The following is the precise specification. These steps must be followed in the same order. If a field in D is not specified, it is treated as if it were zero. If a field in S is not specified, it is treated in the calculation as if it were the minimum allowed value in that field, however, after the calculation is concluded, the corresponding field in E is removed (set to unspecified).

Examples:

dateTime duration result
2000-01-12T12:13:14Z P1Y3M5DT7H10M3.3S 2001-04-17T19:23:17.3Z
2000-01 -P3M 1999-10
2000-01-12 PT33H 2000-01-13

Commutivity and Associativity

Durations are added by simply adding each of their fields, respectively, without overflow. Thus P1M3D + P32M51D = P33M54D.

The order of addition of durations to dateTimes is significant. For example, there are cases where:

((dateTime + duration1) + duration2) ((dateTime + duration2) + duration1)

Example:

(2000-03-30 + P1D) + P1M= 2000-03-31 + P1M= 2001-04-30

(2000-03-30 + P1M) + P1D= 2000-04-30 + P1D= 2000-05-01

4. Fixing Inheritance

The inheritance among time datatypes in the September Schema draft is incorrect and types such as date, time, month, are categorized incorrectly. In ISO 8601 and in database usage and common use the latter are not recurringDurations, they simply have unspecified fields. This is an important fact. The September Schema spec says that time is a recurring instant, occurring every day. That is clearly false. If my plane flight is at 13:43, that does not mean that it recurs every day throughout history, past and future. It simply means that the date fields are unspecified. This also has implications for comparison.

Here are the changes that I believe should be made:

  1. Make dateTime a primitive datatype.
  2. Remove recurringDuration.
  3. Make time, date, month, timeInstant descend directly from dateTime. The only differences are:
  4. Drop Century. The FDIS of 8601 is revising CCYY to be YYYY, as it should have been done originally. Also change occurrences of CCYY to YYYY where they occur, and change the wording for CC and YY appropriately.
  5. Allow zero and negative years, as in the FDIS. Explain that 1 BC= 0000, 2BC= -0000, etc.

If these are done, we will finally have a clear, well-defined set of datatypes for date and time, with a well-defined ordering among all descendents of the base types.

5. Partial Order Caution

Important: You can't treat as =. In fact, they are doubly contradictory:

Partial orders cannot be magically transformed into full orders. If you treat as = you corrupt the ordering completely!

Example:

If you treat as =, then

29 = 30, 1 = 2, right= wrong, Florida election officials = competent and unbiased, fire falls from the sky, ancient evils awaken, twinkies decay, civilization collapses, etc. Ok, I exaggerate slightly, but bottom line is, don't do it.

6. Boundaries

For testing boundaries, I suggest the following. It has the advantage of agreeing with the normal definition on fully ordered datatypes, and providing a mechanism for determining whether you want to include the incomparable items or not.

So for example:

However, if a "keepIncomparables" attirbute were added (perhaps better named as "includeIncomparables"), then that would work as well.

7. Distinguished DateTimes

Distinguished dateTimes have been chosen to represent points in time that cause the greatest deviations in the addition of durations. That is, starting from these dates, adding (or subtracting) N months will have the greatest or least number of days. They are derived as follows:

Suppose we are adding N months. Within a year, the least number of days in N months starts with February 1 in a non-leap year. The greatest number of days in N months takes two dates. Starting at July 1, you have two months of 31 days in a row. This gives you the largest number of days per N months for a while. However, once you hit February (even assuming a leap year), you drop below what you would get if you started counting from March 1.

Across years, for the least number of days in N months, you want to start just before the longest stretch of time that doesn't have leap years. Because of the 400 year cycle in the Gregorian calendar, we can restrict ourselves to just the period at or after 1600. During this span, every 4 years there is a leap year, except 1700, 1800, 1900; 2100, 2200, 2300;.... So we start at 1697. For the greatest number of days in N months, you want to start just before the longest stretch of time that does have leap years: that works out to 1903. These can be combined into three of the distinguished dates.

If we are subtracting N months, then the least number of days in -N months starts with March 1 in a non-leap year. That is because March 1 - Feb 1 = 28 days. Similarly, the greatest number of days in -N months starts either September 1 (since the two months before it each have 31 days), or Feb 1 (since there is the longest span just before the short month).

Across years, for the least number of days in -N months, you want to start just after the longest stretch of time that doesn't have leap years. That is 1903 (March). For the greatest number of days in -N months, you want to start just after the longest stretch of time that does have leap years: that works out to 1696/7. Two of the resulting dates are the same as for adding months. The one addition is September of 1696.

Editorial Note: I don't think the days in the months matter for these calculations, but I need to verify this. Also, for the September and June dates, there is probably more freedom in choosing the years, but these years will work.

8 Comparison Relations

From the basic definition of less than or equal (P Q), all the other comparison operations can be derived. For information on the symbols here, see Notation.

Alternatively, one could start from the two relations <  and =, and derive , then derive the other operations from there.

Partial vs. Total Orders

Definition (partial order): A binary relation R on a set A is a partial order if and only it is:

(1) reflexive (e.g. for all a in A: a R a)
(2) antisymmetric (e.g. for all a, b in A: a R b and b R a implies a = b)
(3) transitive (e.g. for all a, b in A: a R b and b R c implies a R c)

Definition (total order): A binary relation R on a set A is a total order if and only if it is

(1) a partial order, and
(2) total (e.g. for all a, b in A: either a R b or b R a)

The most common example of a partial order is the subset relation: it is reflexive, antisymmetric, and transitive, but not total. For example,

For a total order, the relations and are unnecessary, since in that case:

For a partial order, however, these relations must be carefully distinguished. See Partial Order Caution.

9 Notation

The above discussion uses images for the ordering relations, since not all browsers may be set up to use the more standard mathematical characters provided in Unicode. To set up your browser to display these characters properly, see Display Problems? The following table lists those characters, the images, and an ASCII representation for use in plain text.

Relation Normal Negated
ASCII Unicode Image ASCII Unicode Image
equal to x= y x = y x = y x != y x ≠ y x y
less than x < y x < y x < y x !< y x ≮ y x y
greater than x > y x > y x > y x !> y x ≯ y x y
comparable to* x <=> y x ⋚ y x y x !<=> y x ⋚̸ y x y
less than or equal to x <= y x ≤ y x y x !<= y x ≰ y x y
greater than or equal to x >= y x ≥ y x y x !>= y x ≱ y x y
less than or greater than x <> y x ≶ y x y x !<> y x ≸ y x y

 * aka less than, equal to, or greater than

10 Canonical Duration

A duration D can be put into a canonical form D' using the following process.

In canonical form, all the fields are bound to their "normal" values except for the day. The days can be arbitrarily large, since there is no set relation to months.