Parsing quasi-standard date-time strings

Proposal: https://github.com/tc39/proposal-uniform-interchange-date-parsing

Technical Background

ISO 8601

http://dotat.at/tmp/ISO_8601-2004_E.pdf

ISO 8601 contains multitudes

This International Standard is applicable whenever representation of dates in the [proleptic] Gregorian calendar, times in the 24-hour timekeeping system, time intervals and recurring time intervals or of the formats of these representations are included in information interchange. It includes

— calendar dates expressed in terms of calendar year, calendar month and calendar day of the month;

— ordinal dates expressed in terms of calendar year and calendar day of the year;

— week dates expressed in terms of calendar year, calendar week number and calendar day of the week;

— local time based upon the 24-hour timekeeping system;

— Coordinated Universal Time of day;

— local time and the difference from Coordinated Universal Time;

— combination of date and time of day;

— time intervals;

— recurring time intervals.

ISO 8601:2004(E) §1

ISO 8601 Taxonomy

ISO 8601 formats

basic format

format of a date and time representation or date and time format representation comprising the minimum number of time elements necessary for the accuracy required

NOTE The basic format should be avoided in plain text.

extended format

extension of the basic format that includes additional separators

ISO 8601:2004(E) §2.3

ISO 8601 extended format date representations

Representation Group

Example

Example (expanded year)

calendar date

2018-07-24

+002018-07-24

ordinal date

2018-205

+002018-205

week date

2018-W30-2

+002018-W30-2

calendar date (reduced to month precision)

2018-07

+002018-07

calendar date (reduced to year precision)

2018

+002018

ISO 8601 extended format time of day representations

Representation Group

No Fraction

Comma Fraction

Dot Fraction

local time

15:27:45

15:27:45,4

15:27:45.4

local time (minute precision)

15:27

15:27,75

15:27.75

UTC of day

23:20:30Z

23:20:30,1Z

23:20:30.1Z

UTC of day (minute precision)

23:20Z

23:20,5Z

23:20.5Z

local and UTC offset

15:27+01:00

15:27,6+01:00

15:27.6+01:00

local and hours-only UTC offset

16:42:33-07

16:42:33,15-07

16:42:33.15-07

ISO 8601 profiles

A Profile of ISO 8601 is a specification developed by a particular community which explains how ISO 8601 is to be used, to carry out a particular function or group of functions relevant to that community.

1. It may list features of 8601 to be supported.

2. In cases where there are multiple methods specified in 8601 to support a particular function, the profile may select a single method.

3. In cases where there are different interpretations of a particular function, the profile may select a single interpretation, or provide clarification.

4. It might list features that are not relevant and need not be supported.

5. It might specify several levels of support.

ISO/DIS 8601-2:2016(e) §B.3

http://dotat.at/tmp/ISO_8601-201x-2-DIS.pdf

RFC 3339

https://tools.ietf.org/html/rfc3339

Internet Date/Time Format

The following profile of ISO 8601 dates SHOULD be used in new protocols on the Internet.”

YYYY-MM-DD(T | t ) HH:mm:ss [ .s ] ( Z | z | ±HH:mm )

YYYY is the decimal digits of the [proleptic Gregorian] year 0000 to 9999.
MM is the month of the year from 01 (January) to 12 (December).
DD is the day of the month from 01 to 31.
HH is the number of complete hours as two decimal digits from 00 to 23.
mm is the number of complete minutes as two decimal digits from 00 to 59.
ss is the number of complete seconds as two decimal digits from 00 to 59, or 60 for a positive leap second.
s… is fractional second digits.
Z and z is the zero-offset UTC designator.

Specifications that use this format… MAY further limit the date/time syntax so that the letters 'T' and 'Z' used in the date/time syntax must always be upper case. Applications that generate this format SHOULD use upper case letters.”

https://tools.ietf.org/html/rfc3339#section-5.6

ECMAScript Date Time String Format

ECMAScript Date Time String Format

“ECMAScript defines a string interchange format for date-times based upon a simplification of the ISO 8601 Extended Format.”

[ ±YY ] YYYY [ -MM [ -DD ] ] [ THH:mm [ :ss [ .sss ] ] [ Z ] ]

YYYY is the decimal digits of the [proleptic Gregorian] year 0000 to 9999.
YY is two extra decimal digits for years after 9999 or before 0000.
MM is the month of the year from 01 (January) to 12 (December).
DD is the day of the month from 01 to 31.
HH is the number of complete hours… as two decimal digits from 00 to 24.
mm is the number of complete minutes… as two decimal digits from 00 to 59.
ss is the number of complete seconds… as two decimal digits from 00 to 59.
sss is the number of complete milliseconds… as three decimal digits.
Z is the UTC offset specified as "Z" for UTC, or either "+" or "-" followed by HH:mm.

Spec: https://tc39.github.io/ecma262/#sec-date-time-string-format

Pay careful attention to the range boundaries.

ECMAScript Date Time String Format

Date-only forms:

  • [±YY]YYYY
  • [±YY]YYYY-MM
  • [±YY]YYYY-MM-DD

Date-time forms append time of day with optional UTC offset:

  • <date>THH:mm[Z]
  • <date>THH:mm:ss[Z]
  • <date>THH:mm:ss.sss[Z]

Note that offset is only valid after a time of day.

ECMAScript Date parsing

“The [Date.parse] function first attempts to parse the format of the String according to the rules (including extended years) called out in Date Time String Format (20.3.1.15). If the String does not conform to that format the function may fall back to any implementation-specific heuristics or implementation-specific date formats. Unrecognizable Strings or dates containing illegal element values in the format String shall cause Date.parse to return NaN.”

“If x is any Date object whose milliseconds amount is zero… then all of the following expressions should produce the same numeric value:
x.valueOf()
Date.parse(x.toString())
Date.parse(x.toUTCString())
Date.parse(x.toISOString())

Where is the line drawn between “illegal element values” (which result in NaN) and nonconformance (which results in implementation-specific fallback)?

Comparative Landscape

Advancement

Draft spec text

  • Let msInputString be the String value that is the same as inputString except that each sequence of the code unit 0x002E (FULL STOP) followed by one or more consecutive DecimalDigits has been [normalized and truncated to . followed by three DecimalDigits].
  • Let inBoundsInputString be the String value that is the same as msInputString except that each occurrence of DecimalDigit has been replaced with 1.
  • If inBoundsInputString conforms to the interchange format described in Date Time String Format (20.3.1.16), including expanded years, then
    • [If the date and time represented by msInputString fails a bounds check], return NaN.
    • Return the time value corresponding with the date and time represented by msInputString.
  • If there is a nonempty set of finite time values such that invoking the initial value of Date.prototype.toString (20.3.4.41) with a Date instance having a [[DateValue]] internal slot equal to any of them would return inputString, then
    • Return the lowest-valued member of the set.
  • Else if there is a nonempty set of finite time values such that invoking the initial value of Date.prototype.toUTCString (20.3.4.43) with a Date instance having a [[DateValue]] internal slot equal to any of them would return inputString, then
    • Return the lowest-valued member of the set.
  • Else if the implementation includes a further facility for parsing dates and times, then
    • Return the implementation-dependent time value returned from using that facility to attempt parsing inputString.

https://tc39.github.io/proposal-uniform-interchange-date-parsing/

Cases to explore

RFC 3339-conforming

// positive leap second
Date.parse("1972-06-30T23:59:60Z")

// too few fractional second digits
Date.parse("2019-03-26T14:00:00.9Z")

// too many fractional second digits
Date.parse("2019-03-26T14:00:00.4999Z")

// lowercase time designator
Date.parse("2019-03-26t14:00Z")

// lowercase UTC designator
Date.parse("2019-03-26T14:00z")

https://jsbin.com/kuyubexitu

Cases to explore

non-3339 ISO 8601 calendar date time

// comma as decimal sign
Date.parse("2019-03-26T14:00:00,999Z")

// hours-only offset
Date.parse("2019-03-26T10:00-04")

// fractional minutes
Date.parse("2019-03-26T14:00.9Z")

// ISO basic format date and time
Date.parse("20190326T1400Z")

https://jsbin.com/kuyubexitu

Cases to explore

context-sensitive bounds violation

// out-of-bounds day of month
Date.parse("2019-02-30")

// time past end of day
Date.parse("2019-03-25T24:01Z")

// UTC offset too large
Date.parse("2019-03-26T14:00+24:00")

// unused leap second opportunity
Date.parse("2018-06-30T23:59:60Z")

// bogus leap second (not at end of month)
Date.parse("2019-03-26T23:59:60Z")

// really bogus leap second (not even at end of UTC day)
Date.parse("2019-03-26T13:59:60Z")

https://jsbin.com/kuyubexitu

Cases to explore

missing elements or characters

// zero UTC offset without time elements
Date.parse("2019-03-26Z")

// positive UTC offset without time elements
Date.parse("2019-03-26+01:00")

// negative UTC offset without time elements
Date.parse("2019-03-26-04:00")

// ISO basic format UTC offset
Date.parse("2019-03-26T10:00-0400")

https://jsbin.com/kuyubexitu

Cases to explore

other nonconforming

// too many expanded year digits
Date.parse("+0002019-03-26T14:00Z")

// too few expanded year digits
Date.parse("+2019-03-26T14:00Z")

// too many unsigned year digits
Date.parse("002019-03-26T14:00Z")

// too few unsigned year digits
// Note that ISO 8601 specifies interpretation of two year digits as a century
Date.parse("019-03-26T14:00Z")

// Non-Z “military” designation letter offset
// Note the hazard with time-missing UTC offsets (e.g., "2019-03-26T")
Date.parse("2019-03-26T10:00Q")

// space as time designator
Date.parse("2019-03-26 14:00Z")

// no digits after decimal sign
Date.parse("2019-03-26T14:00:00.")

https://jsbin.com/kuyubexitu

Next steps

“There is a time for some things, and a time for all things; a time for great things, and a time for small things.”

—Miguel de Cervantes, Don Quixote de la Mancha

https://tc39.github.io/process-document/

Stage 2 Acceptance Signifies: The committee expects the feature to be developed and eventually included in the standard

Parsing quasi-standard date-time strings (March 2019) - Google Slides