Extended Date Time Format
Official Web Site  

Analysis

The analysis below refers to the requirements listed in Problem, Requirements, and Basic Approach. So the requirements are repeated here.

Requirements Repeated

  1. The "basic" option in ISO 8601, i.e. YYYYMMDD without hyphens, and HHMMSS without colons.
  2. BC dates.
  3. Time zones.
  4. Year and month only (no day of month), or year only.
  5. Questionable dates. E.g. 1992? would mean "possibly" the year 1992, but not "definitely".
  6. Approximate dates. E.g. 1992~ would mean "approximately" the year 1992.
  7. Uncertain dates. E.g. 199? would mean some year in the 1990s, but not certain which year; 1999-?? would mean some month in 1999, 199901?? would mean some day in the month of 1999-01.
  8. Date range (start and end).
  9. End date “open” in a date range.
  10. Start and/or end date "unknown" in a date range.

Analysis of Requirements

Requirement (1) is to allow raw data to be extracted and put into xml without conversion. It is not intended to subject all data to this requirement, only data that is typically encountered in database records that needs to be converted to XML (data that conforms to ISO 8601 but hyphens/colons are not included). For data where conversion is not an issue, and which is handled by xs:date or xs:dateTime, these two built-in types are preferred, because built-in validation is provided.

For example, BC dates (2) are not often found in bibliographic metadata, they are mostly original data, so conversion is not a major issue, and so representing them with hyphens is acceptable. The same holds for time zones (3). BC dates and time zones are handled by xs:date and xs:dateTime. ("-2004-01-01" is a valid date. 2004-01-01T10:10:10Z, indicating UTC time zone, is a valid time.) So neither need be treated by any special schema logic, they can be entered as xs:date and xs:dateTime.

Year-and-month-only or year-only (4), although supported by ISO 8601, are not supported by xs:date. In fact their use in conjuction with the additional requirements - questionable (5), approximate (6), or uncertain (7) dates - is not even supported by ISO 8601.

Similarly, ISO 8601 supports ranges (8) though xs:date does not, and similarly, ISO does not support it in conjuction with the additional requirements: "OPEN" (9) and "UNKNOWN" (10).

None of requirements (4) through (7) apply to dateTime, they apply only to date.   So a pattern (a regular expression) developed for date requirements will have very different features than one developed for date/time.  And the special requirements for range impose yet additional features. So (at least) three patterns, one (or more) for date, one (or more) for date/time, and one (or more) for range, are proposed.

The right number of patterns is of course a subjective design decision: one large complex pattern vs. several smaller, simpler patterns.  Too few patterns increase the complexity of each individual pattern. More patterns increases overall complexity.  In any case, simplicity comes at the expense of decreased validation power.  

Two patterns for date, one for date/time, and two for range are shown below. These patterns together with xs:date and xs:dateTime are combined as a union (via xs:union); any string validates if it conforms to one of these types.

Patterns

 

patterns for Date

  1. <xs:pattern value="\d{2}(\d{2}|\?\?|\d(\d|\?))(-(\d{2}|\?\?))?~?\??"/>

    year (yyyy) or year-month (yyyy-mm) where the last or last two digits of year may be '?' (199? means some year from 1990 to 1999; 19?? means some year from 1900 to 1999), or month may be '??' ( 2004-?? "means some month in 2004"), and the entire string may end with '?' or '~' for "uncertain" or "approximate".

  2. <xs:pattern value="\d{6}(\d{2}|\?\?)~?\??"/>

    yearMonthDay - yyyymmdd, where 'day' may be '??' so '200412??' means "some day during the month of 12/2004". The whole string may be followed by '?' or '~' .    hyphens are not allowed for this pattern. Year-month-day with hyphens will validate via xs:date. (It seems unnecessary to support year-month-date with hyphens along with the additional requirements; for year-month-date with the additional requirements the non-hyphen form should suffice.)

    The key issue with dates is hyphens. ISO 8601 requires a hyphen for year-month (with no day). Hyphens are optional for year/month/day.   The fact that 8601 requires hyphens for year-month doesn't present a big problem, that is, with regard to requirement (1) because most if not all of the date-data of concern (data for conversion) is of the form year/month/date. To be as compatible with 8601 as possible, all dates of the form year-month will include the hyphen.  For dates where the day is included, the form with no hyphens is accommodated; "with hyphens" is supported via xs:date so no special schema logic is needed.

pattern for Time

  1. <xs:pattern value="\d{8}T\d{6}"/>

    'yyyymmddThhmmss' (with T separator). Hyphens in date and colons in time are not allowed for this pattern.

patterns for Range

  1. <xs:pattern value="((\d{4}(-\d{2})?)|unknown)/((\d{4}(-\d{2})?)|unknown|open)"/>

    For years - 'yyyy/yyyy'; for year/month - yyyy-mm/yyyy-mm. Beginning or end of range value may be 'UNKNOWN'. End of range value may be 'OPEN'. hyphens mandatory when month is present.
  2. <xs:pattern value="(\d{4}((-)?\d{2}((-)?\d{2}(T\d{2}(:)?\d{2}((:)? \d{2}(\.\d*)
    ?)?((Z|(\+|-)\d{2}(:)?\d{2})?))?)?)?|unknown)
    /(\d{4}((-)?\d{2}((-)?\d{2}(T\d{2}(:)?\d{2}((:)?\d{2}(\.\d*)
    ?)?((Z|(\+|- )\d{2}(:)?\d{2})?))?)?)?|unknown|open)"/>

    extends support for a range, by supporting a datetime range. For example:
    "20050705T0715-0500/20050705T0720-0500". Hyphens in date and/or colon in time may be included or excluded. Time zone optional. Month only or month-day only supported.

Limitation
None of these patterns provides more than rudimentary validation. They enforce for example that the date has eight digits, the time has six digits, certain masking characters, and the words “open or “unknown” in a range. But a pattern cannot (without excessive complexity) validate that the month is between 1 and 12 and the day between 1 and 31, much less that the day is consistent with the month (e.g.  that if the month is 04 then the day must be 30 or less), and so on.

These are things that xs:date does very well.  So xs:date and xs:dateTime should be used whenever none of the special features is needed that these do not support.

Schema

The sample schema below incorporated the above patterns. For usage, see Tools and Usage.

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified" attributeFormDefault="unqualified">
<!--
Extended Date/Time Format: edtf

This schema is an "include" file. It does not define a root, it defines a single simple type, edtfSimpleType. A schema may "include" this schema and then reference it for example as follows:

<xs:element name="dateOfBirth" type="edtfSimpleType"/>

************************* edtfSimpleType

edtfSimpleType is the union of three simple types - xsDate, xs:dateTime, and edtfRegularExpressions. ("union" means that any string conforming to any one of the types in the union will validate.) xs:date and xs:dateTime are built-in W3Cschema types. edtfRegularExpressions is a set of four regular expressions which are described below. So any string that conforms to one of the two built-in types or any of the four regular expressions will validate.
-->
<xs:simpleType name="edtfSimpleType">
<xs:union memberTypes="xs:date xs:dateTime edtfRegularExpressions"/>
</xs:simpleType>
<!--
******** edft
-->
<xs:simpleType name="edtfRegularExpressions">
<xs:restriction base="xs:string">
<!--
The following pattern is for year (yyyy) or year-month (yyyy-mm)
The last or last two digits of year may be '?' meaning "one year in that range but not sure which year", for example 19?? means some year from 1990 to 1999. Similarly month may be '??' so that 2004-?? "means some month in 2004". And the entire string may end with '?' or '~' for "uncertain" or "approximate".
Hyphen must separate year and month.
-->
<xs:pattern value="\d{2}(\d{2}|\?\?|\d(\d|\?))(-(\d{2}|\?\?))?~?\??"/>
<!--
The following pattern is for yearMonthDay - yyyymmdd, where 'dd' may be '??' so '200412??' means "some day during the month of 12/2004".
The whole string may be followed by '?' or '~' to mean "questionable" or "approximate". hyphens are not allowed for this pattern.
-->
<xs:pattern value="\d{6}(\d{2}|\?\?)~?\??"/>
<!--

The following pattern is for date and time with T separator:'yyyymmddThhmmss'.
hyphens in date and colons in time not allowed for this pattern.
-->
<xs:pattern value="\d{8}T\d{6}"/>
<!--

The following pattern is for a date range. in years: 'yyyy/yyyy'; or year/month: yyyy-mm/yyyy-mm. Beginning or end of range value may be 'UNKNOWN'. End of range value may be 'OPEN'.
hyphens mandatory when month is present.
-->
<xs:pattern value="((\d{4}(-\d{2})?)|unknown)/((\d{4}(-\d{2})?)|unknown|open)"/>

<!-- The following pattern extends support for a range, by supporting a datetime range. For example:
"20050705T0715-0500/20050705T0720-0500". Hyphens in date and/or colon in time may be included or excluded. Time zone optional. Month only or month-day only supported.
-->

<xs:pattern value="(\d{4}((-)?\d{2}((-)?\d{2}(T\d{2}(:)?\d{2}((:)?\d{2}(\.\d*)?)?((Z|(\+|-
)\d{2}(:)?\d{2})?))?)?)?|unknown)/(\d{4}((-)?\d{2}((-
)?\d{2}(T\d{2}(:)?\d{2}((:)?\d{2}(\.\d*)?)?((Z|(\+|-)\d{2}(:)?\d{2})?))?)?)?|unknown|open)"/>
<!-- -->
</xs:restriction>
</xs:simpleType>