Schema Time Changes (Draft)

M. Davis

Last updated: 2000-12-11

This is a set of notes on the topic of time instants and durations, centered around the Schema and ISO 8601 discussions. There were two goals: make sure that there was no loss of data in the representation of dates and time, and make sure that the date and time datatypes were well-defined (which they were not, in the previous version).

The first goal is to represent the range of dateTimes that can be represented in major commercial systems. Otherwise there will be data corruption when encapsulating information in XML. This means representing a range of approximately 300,000,000,000 years, based on the maximum values in the following:

Windows File Time: 64 bits of elapsed time since Jan 1, 1601, 00:00:00 UTC, in 100 nanosecond units.
Java Time: 64 bits of elapsed time since January 1, 1970, 00:00:00 UTC, in 1 millisecond units.
Mac Long Time: 64 bits of elapsed time since January 1, 1904, 00:00:00 UTC, in 1 second units.

The simplest and most portable representation for an instant in time would be in seconds from a fixed start date. The Schema group is using an ISO 8601, Gregorian based representation. Although not optimal for internationalization and efficiency, this does work for time instants if the calendar is defined to be a proleptic Gregorian calendar, e.g. one that is defined into the indefinite past and future, even before the calendar was adopted. This defines calendar years before year 1 differently than the AD/BC calendar system: year -2, year -1, year 0, year 1, year 2, ...; and leap years are defined into the past on that basis. So year 0 is a leap year, as is year -4, -8, etc. Year -100 is not a leap year. This is the same system used in the FDIS 8601.

The second goal is to be well-defined. Given an instant in time you can determine precisely which other instants are greater than, equal to, or less than that instant. Given a time period, you can determine which time instants are within that time period. Since the Schema group also wants durations patterned after ISO 8601, those have to be made well-defined (they are not in 8601). Thus a precise specification of the ordering of durations and time instants is required, and for the addition of dateTimes and durations. That also makes timePeriods (that is, the <dateTime, duration> pair) well defined.

The organization of these notes is a bit haphazard, since it reflects different topics that have come up. For the symbols used here, see Notation.

1. Comparing Durations

[Ed note: either add this to the durations section or as an appendix.]

Durations are only partially ordered, since there is no determinate relationship between certain durations such as one month (P1M) versus 30 days (P30D). The ordering between two durations D and C is defined as as follows, based on the addition operation over dateTime and durations as defined in Appendix X. This is a logical explanation of the process. Actual implementations are free to optimize as long as they produce the same results.

D C iff for each distinguished dateTime P, (P + D) (P + C)
The distinguished dateTimes are defined to be the following four.
(For information on the origin for these, see Distinguished DateTimes)
- 1696-09-01T00:00:00Z
- 1697-02-01T00:00:00Z
- 1903-03-01T00:00:00Z
- 1903-07-01T00:00:00Z

From this basic definition, all the other comparison relations can be derived, as in Comparison Relations. This includes the incomparable relation, D C.

The following table shows the strongest relationship that can be determined between example durations: Note that because of leap-seconds, a seconds field can vary from 59 to 60. However, because of the way that addition is defined in Appendix X, these are still totally ordered.

	Relation
P1Y	> P364D	P365D				P366D	< P367D
P1M	> P27D	P28D	P29D		P30D	P31D	< P32D
P5M	> P149D	P150D	P151D	P152D		P153D	< P154D

Implementations are free to optimize the computation of the ordering relationship. For example, the following table can be used to compare durations of a small number of months against days.

	Months	1	2	3	4	5	6	7	8	9	10	11	12	13	...
Days	Minimum	28	59	89	120	150	181	212	242	273	303	334	365	393	...
Days	Maximum	31	62	92	123	153	184	215	245	276	306	337	366	397	...

Total order durations

Certain derived datatypes of durations can be guaranteed have a total order. To do so, they cannot have fields from more than one column in the table below.

A	B
year, month	day, hour, minute, second

For example, a monthDuration datatype could be defined that specified the month, but required all other fields to be unspecified. These monthDurations would have a total order.

If you are using partially ordered datatypes, see Partial Order Caution, and Boundaries.

2. Comparing DateTimes

[Ed note: either add this to the dateTime section or as an appendix.]

DateTimes are only partially ordered, since there is no determinate relationship between certain dateTimes. For example, there is no determinate ordering between (a) 2000-03 and (b) 2000-03-05, or between (c) 2000-01-20T12:00:00 and (d) 2000-01-20T12:00:00Z. Note that based on timezones currently in use, (c) could vary from 2000-01-20T12:00:00+12 to 2000-01-20T12:00:00-13. It is, however, possible for this range to expand or contract in the future, based on local laws. Because of this, the following definition uses a somewhat broader range of indeterminate values: +14..-14.

The following definition uses the notation S[year] to represent the year field of S, S[month] to represent the month field, and so on. The notation (S & "-14") means adding the timezone -14 to S, where S did not already have a timezone. This is a logical explanation of the process. Actual implementations are free to optimize as long as they produce the same results. From basic relations provided below, all the other comparison relations can be derived, as in Comparison Relations. This includes the incomparable relation: D C.

The ordering between two dateTimes P and Q is defined by the following process:

A. Normalize P and Q. That is, if there is a timezone present, but it is not Z, convert it to Z using the addition operation defined in Appendix X.

Thus 2000-03-04T23:00:00+03 normalizes to 2000-03-05T02:00:00Z

B. If P and Q either both have a timezone, or both don't have a timezone, compare P and Q field by field from the year field down to the second field, and return a result as soon as it can be determined. That is:

For each i in {year, month, day, hour, minute, second}

If P[i] and Q[i] are both not specified, continue to the next i

If P[i] is not specified and Q[i] is, or vice versa, stop and return P Q

If P[i] < Q[i], stop and return P < Q

If P[i] > Q[i], stop and return P > Q

If the steps in #1 did not terminate, stop and return P = Q

C. Otherwise, if P contains a timezone and Q does not, then compare them on this basis:

if P (Q & "-14") then P Q

else if P (Q & "+14") then P Q

else P Q

D. Otherwise, P does not contain a timezone and Q does, so compare them on this basis:

if (P & "+14") Q then P Q

else (P & "-14") Q then P Q

else P Q

Examples:

Determinate Indeterminate

2000-01 < 2000-02-15 2000-01-15T12 2000

2000-01-15= 2000-02-15 2000-01-01T12:00:00 1999-12-31T23:00:00Z

2000-01-15T12:00:00 < 2000-01-16T12:00:00Z 2000-01-16T12:00:00 2000-01-16T12:00:00Z

2000-01-15 2000-01-16T12:00:00Z

Determinate	Indeterminate
2000-01 < 2000-02-15	2000-01-15T12 2000
2000-01-15= 2000-02-15	2000-01-01T12:00:00 1999-12-31T23:00:00Z
2000-01-15T12:00:00 < 2000-01-16T12:00:00Z	2000-01-16T12:00:00 2000-01-16T12:00:00Z
	2000-01-15 2000-01-16T12:00:00Z

Total order dateTimes

Certain derived types from dateTimes can be guaranteed have a total order. To do so, they must require that a specific set of fields are always specified, and that remaining fields (if any) are always unspecified. For example, the date datatype is defined to contain exactly year, month, and day. Thus dates have a total order among themselves.

Note: The date datatype doesn't include a timezone. For one thing, the intuitive notion of a date does not involve timezone, any more than the intuitive notion of a year does. For another, SQL dates do not support time zone, so there is a precedent, and this will make compatability with SQL easier. [from Fred Zemke]

If you are using partially ordered datatypes, see Partial Order Caution, and Boundaries.

3. Appendix X. Addition of dateTime and duration

Given a dateTime S and a duration D, this appendix specifies how to compute an dateTime E. Such computations are used, for example, to determine when a dateTime is within a duration. This is a logical explanation of the process. Actual implementations are free to optimize as long as they produce the same results. The calculation uses the notation S[year] to represent the year field of S, S[month] to represent the month field, and so on. It also depends on the following functions:

fQuotient(a, b)= the greatest integer less than or equal to a/b
- fQuotient(-1,3)= -1
- fQuotient(0,3)...fQuotient(2,3)= 0
- fQuotient(3,3)= 1
- fQuotient(3.123,3)= 1
modulo(a, b)= a - fQuotient(a,b)*b
- modulo(-1,3)= 2
- modulo(0,3)...modulo(2,3)= 0...2
- modulo(3,3)= 0
- modulo(3.123,3)= 0.123
fQuotient(a, low, high)= fQuotient(a - low, high - low)
- fQuotient(0, 1, 13)= -1
- fQuotient(1, 1, 13) ... fQuotient(12, 1, 13)= 0
- fQuotient(13, 1, 13)= 1
- fQuotient(13.123, 1, 13)= 1
modulo(a, low, high)= modulo(a - low, high - low) + low
- modulo(0, 1, 13)= 12
- modulo(1, 1, 13) ... modulo(12, 1, 13)= 1...12
- modulo(13, 1, 13)= 1
- modulo(13.123, 1, 13)= 1.123
maximumDayInMonthFor(yearValue, monthValue) =
- M := modulo(monthValue, 1, 13)
- Y := yearValue + fQuotient(monthValue, 1, 13)
- Return a value based on M and Y:

31 M= January, March, May, July, August, October, or December

30 M= April, June, September, or November

29 M= February AND (modulo(Y, 400)= 0 OR
(modulo(Y, 100) 0) AND modulo(Y, 4)= 0)

28 Otherwise

Note: yearValue and monthValue usually are some X[year] and X[month], but sometimes they are computed. So the M and Y computations account for over- and underflow.

Using these functions, compute E from S and D as follows.

Essentially, this calculation is equivalent to separating D into <year,month> and <day,hour,minute,second> fields. The <year,month> is added to S. If the day is out of range, it is pinned to be within range. Thus April 31 turns into April 30. Then the <day,hour,minute,second> is added. This latter addition can cause the year and month to change.

Note: Leap seconds are handled by the computation by treating them as overflows. Essentially, a value of 60 seconds in S is treated as if it were a duration of 60 seconds added to S (with a zero seconds field). All calculations thereafter use 60 seconds per minute. (This note simply explains the rationale; the actualy results are given by following the algorithm.)

Thus the addition of either PT1M or PT60S to any dateTime will always produce the same result. This is a special definition of addition which is designed to match common practice, and -- most importantly -- be stable over time.

A definition that attempted to take leap-seconds into account would need to be constantly updated, and could not predict the results of future implementation's additions. The decision to introduce a leap second in UTC is the responsibility of the International Earth Rotation Service (IERS). They make periodic announcements as to when leap seconds are to be added, but this is not known more than a year in advance. For more information on leap seconds, see U.S. Naval Observatory Time Service Department.

Because of this, the addition of seconds using durations is a specially-defined operation which is not guaranteed to match the real-life addition of seconds with UTC.

The following is the precise specification. These steps must be followed in the same order. If a field in D is not specified, it is treated as if it were zero. If a field in S is not specified, it is treated in the calculation as if it were the minimum allowed value in that field, however, after the calculation is concluded, the corresponding field in E is removed (set to unspecified).

Months (may be modified additionally below)
- temp := S[month] + D[month]
- E[month] := modulo(temp, 1, 13)
- carry := fQuotient(temp, 1, 13)
Years (may be modified additionally below)
- E[year] := S[year] + D[year] + carry
Zone
- E[zone] := S[zone]
Seconds
- temp := S[second] + D[second]
- E[second] := modulo(temp, 60)
- carry := fQuotient(temp, 60)
Minutes
- temp := S[minute] + D[minute] + carry
- E[minute] := modulo(temp, 60)
- carry := fQuotient(temp, 60)
Hours
- temp := S[hour] + D[hour] + carry
- E[hour] := modulo(temp, 24)
- carry := fQuotient(temp, 24)
Days
- if S[day] > maximumDayInMonthFor(E[year], E[month])
  - tempDays := maximumDayInMonthFor(E[year], E[month])
- else if S[day] < 1
  - tempDays := 1
- else
  - tempDays := S[day]
- E[day] := tempDays + D[day] + carry
- START LOOP
  - IF E[day] < 1
    - E[day] := E[day] + maximumDayInMonthFor(E[year], E[month] - 1)
    - carry := -1
  - ELSE IF E[day] > maximumDayInMonthFor(E[year], E[month])
    - E[day] := E[day] - maximumDayInMonthFor(E[year], E[month])
    - carry := 1
  - ELSE EXIT LOOP
  - temp := E[month] + carry
  - E[month] := modulo(temp, 1, 13)
  - E[year] := E[year] + fQuotient(temp, 1, 13)
  - GOTO START LOOP

Examples:

dateTime duration result

2000-01-12T12:13:14Z P1Y3M5DT7H10M3.3S 2001-04-17T19:23:17.3Z

2000-01 -P3M 1999-10

2000-01-12 PT33H 2000-01-13

dateTime	duration	result
2000-01-12T12:13:14Z	P1Y3M5DT7H10M3.3S	2001-04-17T19:23:17.3Z
2000-01	-P3M	1999-10
2000-01-12	PT33H	2000-01-13

Commutivity and Associativity

Durations are added by simply adding each of their fields, respectively, without overflow. Thus P1M3D + P32M51D = P33M54D.

The order of addition of durations to dateTimes is significant. For example, there are cases where:

((dateTime + duration1) + duration2) ((dateTime + duration2) + duration1)

Example:

(2000-03-30 + P1D) + P1M= 2000-03-31 + P1M= 2001-04-30

(2000-03-30 + P1M) + P1D= 2000-04-30 + P1D= 2000-05-01

4. Fixing Inheritance

The inheritance among time datatypes in the September Schema draft is incorrect and types such as date, time, month, are categorized incorrectly. In ISO 8601 and in database usage and common use the latter are not recurringDurations, they simply have unspecified fields. This is an important fact. The September Schema spec says that time is a recurring instant, occurring every day. That is clearly false. If my plane flight is at 13:43, that does not mean that it recurs every day throughout history, past and future. It simply means that the date fields are unspecified. This also has implications for comparison.

Here are the changes that I believe should be made:

Make dateTime a primitive datatype.
- It may be either partially or fully specified.
- Partial specification uses ISO 8601 notation. The "T" is omitted if there is no year, month, or day specified, and there is at least one colon in the string.
  - 1985-03= March 1985
  - -03-05= September 5th
  - --05= the 5th day of the month
  - 05T23= the 23rd hour of the 5th day of the month
  - 23:15= 23 hours, 15 minutes
  - -15:30= 15 minutes, 30 seconds
  - T--30= 30 seconds
  - T23:15:30-04:30= 11:15:30 am in a timezone that is 4 hours and 30 minutes behind UTC.
- There must be at least one specified field.
- It is illegal to have unspecified fields between specified fields, except when the latter field is the zone.
  - Thus, you cannot omit the day unless you either omit year and month, or you omit hour, minute, second
  - Examples of illegal dateTimes: 1999T23, -05T-15, etc.
- It is illegal to have a partial specification where there is no setting of the unspecified fields that would be a valid dateTime according to the Gregorian calendar.
  - For example, -02-30 is illegal, since there is no year YYYY that would make YYYY-02-03 valid.
- It is illegal to specify a seconds field representing a leap second (e.g. greater than or equal to 60) unless the month and day could represent March 31, June 30, September 30, or December 31 in UTC. Because the leap second is added as the last second of the day in UTC time, the long (or short) minute could occur at other times in local time, and will be in the following day in half the world.
Remove recurringDuration.
- its definition is circular, since it depends on dateTimes. So it can't be the basetype.
- it is not in SQL.
Make time, date, month, timeInstant descend directly from dateTime. The only differences are:
- timeInstant must be fully specified: Y, M, D, TH, M, S, and Z
- date must have specified Y, M, and D, and unspecified TH,M,S, and Z
- year must have specified Y, and unspecified M,D,TH,M,S, and Z
- month must have specified M, and unspecified Y, D,TH,M,S, and Z [ASHOK -- do we really need month??]
  - Lexically, it does not require leading '-', that is: 03-05 instead of -03-05
- time must have specified TH, M, S, and Z, and unspecified Y,M, and D.
  - Lexically, it does not require leading 'T', that is: 23:15 instead of T23:15
Drop Century. The FDIS of 8601 is revising CCYY to be YYYY, as it should have been done originally. Also change occurrences of CCYY to YYYY where they occur, and change the wording for CC and YY appropriately.
Allow zero and negative years, as in the FDIS. Explain that 1 BC= 0000, 2BC= -0000, etc.

If these are done, we will finally have a clear, well-defined set of datatypes for date and time, with a well-defined ordering among all descendents of the base types.

5. Partial Order Caution

Important: You can't treat as =. In fact, they are doubly contradictory:

P = Q means that both P Q and P Q.
P Q means that neither P Q nor P Q.

Partial orders cannot be magically transformed into full orders. If you treat as = you corrupt the ordering completely!

Example:

1 Month 30 Days
1 Month 29 Days
29 Days < 30 Days

If you treat as =, then

1 Month = 29 Days < 30 Days = 1 Month, so 1 Month < 1 Month!!!
29 Days = 1 Month = 30 Days, so 29 Days = 30 Days!!!

29 = 30, 1 = 2, right= wrong, Florida election officials = competent and unbiased, fire falls from the sky, ancient evils awaken, twinkies decay, civilization collapses, etc. Ok, I exaggerate slightly, but bottom line is, don't do it.

6. Boundaries

For testing boundaries, I suggest the following. It has the advantage of agreeing with the normal definition on fully ordered datatypes, and providing a mechanism for determining whether you want to include the incomparable items or not.

P conforms to the boundaries iff all of the following are true:
- ! P < minInclusive (when minInclusive is set)
- P > minExclusive (when minExclusive is set)
- P < maxExclusive (when maxExclusive is set)
- ! P > maxInclusive (when maxInclusive is set)

So for example:

maxExclusive= 1 Month.
- P= 27 days, yes
- P= 28 days, no
- P= 29 days, no
- P= 30 days, no
- P= 31 days, no
- P= 32 days, no
- P= 1 month, no
maxInclusive= 1 Month.
- P= 1 month, yes
- P= 27 days, yes
- P= 28 days, yes
- P= 29 days, yes
- P= 30 days, yes
- P= 31 days, yes
- P= 32 days, no

However, if a "keepIncomparables" attirbute were added (perhaps better named as "includeIncomparables"), then that would work as well.

7. Distinguished DateTimes

Distinguished dateTimes have been chosen to represent points in time that cause the greatest deviations in the addition of durations. That is, starting from these dates, adding (or subtracting) N months will have the greatest or least number of days. They are derived as follows:

Suppose we are adding N months. Within a year, the least number of days in N months starts with February 1 in a non-leap year. The greatest number of days in N months takes two dates. Starting at July 1, you have two months of 31 days in a row. This gives you the largest number of days per N months for a while. However, once you hit February (even assuming a leap year), you drop below what you would get if you started counting from March 1.

Across years, for the least number of days in N months, you want to start just before the longest stretch of time that doesn't have leap years. Because of the 400 year cycle in the Gregorian calendar, we can restrict ourselves to just the period at or after 1600. During this span, every 4 years there is a leap year, except 1700, 1800, 1900; 2100, 2200, 2300;.... So we start at 1697. For the greatest number of days in N months, you want to start just before the longest stretch of time that does have leap years: that works out to 1903. These can be combined into three of the distinguished dates.

1697-02-01T00:00:00Z
1903-03-01T00:00:00Z
1903-06-01T00:00:00Z

If we are subtracting N months, then the least number of days in -N months starts with March 1 in a non-leap year. That is because March 1 - Feb 1 = 28 days. Similarly, the greatest number of days in -N months starts either September 1 (since the two months before it each have 31 days), or Feb 1 (since there is the longest span just before the short month).

Across years, for the least number of days in -N months, you want to start just after the longest stretch of time that doesn't have leap years. That is 1903 (March). For the greatest number of days in -N months, you want to start just after the longest stretch of time that does have leap years: that works out to 1696/7. Two of the resulting dates are the same as for adding months. The one addition is September of 1696.

1696-09-01T00:00:00Z

Editorial Note: I don't think the days in the months matter for these calculations, but I need to verify this. Also, for the September and June dates, there is probably more freedom in choosing the years, but these years will work.

8 Comparison Relations

From the basic definition of less than or equal (P Q), all the other comparison operations can be derived. For information on the symbols here, see Notation.

D = C iff D C and C D
D < C iff D C and not C D
D C iff neither D C nor C D
- When there is no determinate ordering relationship between D and C, this is expressed as D C (read as "D is incomparable to C").
D C iff D < C or D > C
- Note that the difference between D >< C and D C is that the latter also includes incomparables.
Reverse Order:
- D C iff C D
- D > C iff C < D
Negations
- D C iff not D= C
- D C iff not D < C
- etc.

Alternatively, one could start from the two relations < and =, and derive , then derive the other operations from there.

D C iff D < C or C= D

Partial vs. Total Orders

Definition (partial order): A binary relation R on a set A is a partial order if and only it is:

(1) reflexive (e.g. for all a in A: a R a)
(2) antisymmetric (e.g. for all a, b in A: a R b and b R a implies a = b)
(3) transitive (e.g. for all a, b in A: a R b and b R c implies a R c)

Definition (total order): A binary relation R on a set A is a total order if and only if it is

(1) a partial order, and
(2) total (e.g. for all a, b in A: either a R b or b R a)

The most common example of a partial order is the subset relation: it is reflexive, antisymmetric, and transitive, but not total. For example,

{1} {1, 3} {1, 2, 3} and
{2} {1, 2} {1, 2, 3}, but
neither {1, 2} {1, 3} nor {1, 3} {1, 2}

For a total order, the relations and are unnecessary, since in that case:

a b if and only if a b
a b if and only if a = b
a b is always true
a b is never true

For a partial order, however, these relations must be carefully distinguished. See Partial Order Caution.

9 Notation

The above discussion uses images for the ordering relations, since not all browsers may be set up to use the more standard mathematical characters provided in Unicode. To set up your browser to display these characters properly, see Display Problems? The following table lists those characters, the images, and an ASCII representation for use in plain text.

Relation	Normal			Negated
Relation	ASCII	Unicode	Image	ASCII	Unicode	Image
equal to	x= y	x = y	x = y	x != y	x ≠ y	x y
less than	x < y	x < y	x < y	x !< y	x ≮ y	x y
greater than	x > y	x > y	x > y	x !> y	x ≯ y	x y
comparable to*	x <=> y	x ⋚ y	x y	x !<=> y	x ⋚̸ y	x y
less than or equal to	x <= y	x ≤ y	x y	x !<= y	x ≰ y	x y
greater than or equal to	x >= y	x ≥ y	x y	x !>= y	x ≱ y	x y
less than or greater than	x <> y	x ≶ y	x y	x !<> y	x ≸ y	x y

* aka less than, equal to, or greater than

10 Canonical Duration

A duration D can be put into a canonical form D' using the following process.

m := D[years]*12 + D[months]
s := D[days]*86400 + D[hours]*1440 + D[minutes]*60 + D[seconds]
D'[years] := fQuotient(m, 12)
D'[months] := modulo(m, 12)
D'[days] := fQuotient(s, 86400)
secs := modulo(s, 86400)
D'[hours] := fQuotient(s, 1440)
secs := modulo(s, 1440)
D'[minutes] := fQuotient(s, 60)
D'[seconds] := modulo(s, 60)

In canonical form, all the fields are bound to their "normal" values except for the day. The days can be arbitrarily large, since there is no set relation to months.

31	M= January, March, May, July, August, October, or December
30	M= April, June, September, or November
29	M= February AND (modulo(Y, 400)= 0 OR (modulo(Y, 100) 0) AND modulo(Y, 4)= 0)
28	Otherwise