| | | | 
XML Schema Part 2: Datatypes Second Edition
W3C Recommendation 28 October 2004- This version:
- http://www.w3.org/TR/2004/REC-xmlschema-2-20041028/
- Latest version:
-
http://www.w3.org/TR/xmlschema-2/
- Previous version:
-
http://www.w3.org/TR/2004/PER-xmlschema-2-20040318/
- Editors:
- Paul V. Biron, Kaiser Permanente, for Health Level Seven <Paul.V.Biron@kp.org>
- Ashok Malhotra, Microsoft (formerly of IBM) <ashokma@microsoft.com>
Please refer to the errata for this document, which may include some normative corrections. XMLXHTML with visible change markupIndependent copy of the schema for schema documentsA schema for built-in datatypes only, in a separate namespaceIndependent copy of the DTD for schema documentsSee also translations. Copyright © 2004 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark, document use and software licensing rules apply.
Abstract
XML Schema: Datatypes is part 2 of the specification of the XML
Schema language. It defines facilities for defining datatypes to be used
in XML Schemas as well as other XML specifications.
The datatype language, which is itself represented in
XML 1.0, provides a superset of the capabilities found in XML 1.0
document type definitions (DTDs) for specifying datatypes on elements
and attributes.
Status of this DocumentThis section describes the status of this document at the
time of its publication. Other documents may supersede this document.
A list of current W3C publications and the latest
revision of this technical report can be found in the W3C technical reports index at
http://www.w3.org/TR/. This is a W3C
Recommendation, which forms part of the Second Edition of XML
Schema. This document has been reviewed by W3C Members and
other interested parties and has been endorsed by the Director as a
W3C Recommendation. It is a stable document and may be used as
reference material or cited as a normative reference
from another document.
W3C's role in making the Recommendation is to draw attention
to the specification and to promote its widespread deployment. This
enhances the functionality and interoperability of the Web.
This document has been produced by the W3C XML Schema Working Group
as part of the W3C XML
Activity. The goals of the XML Schema language are discussed in
the XML Schema
Requirements document. The authors of this document are the
members of the XML Schema Working Group. Different parts of this
specification have different editors.
This document was produced under the 24
January 2002 Current Patent Practice (CPP) as amended by the W3C Patent Policy
Transition Procedure. The Working Group maintains a public
list of patent disclosures relevant to this document;
that page also includes instructions for disclosing a patent.
An individual who
has actual knowledge of a patent which the individual believes
contains Essential Claim(s) with respect to this specification should
disclose the information in accordance with section
6 of the W3C Patent Policy.
The English version of this specification is the only normative
version. Information about translations of this document is available
at http://www.w3.org/2001/05/xmlschema-translations. This second edition is not a new version,
it merely incorporates the changes dictated by the corrections to
errors found in the first
edition as agreed by the XML Schema Working Group, as a
convenience to readers. A separate list of all such corrections is
available at http://www.w3.org/2001/05/xmlschema-errata.
The errata list for this second edition is available at http://www.w3.org/2004/03/xmlschema-errata.
Please report errors in this document to www-xml-schema-comments@w3.org
( archive).
Note: Ashok Malhotra's
affiliation has changed since the completion of
editorial work on this second edition. He is now at Oracle, and can be
contacted at <ashok.malhotra@oracle.com>.
1 Introduction
1.1 Purpose
The [XML 1.0 (Second Edition)] specification defines limited
facilities for applying datatypes to document content in that documents
may contain or refer to DTDs that assign types to elements and attributes.
However, document authors, including authors of traditional
documents and those transporting data in XML,
often require a higher degree of type checking to ensure robustness in
document understanding and data interchange.
The table below offers two typical examples of XML instances
in which datatypes are implicit: the instance on the left
represents a billing invoice, the instance on the
right a memo or perhaps an email message in XML.
| Data oriented | Document oriented |
|---|
<invoice>
<orderDate>1999-01-21</orderDate>
<shipDate>1999-01-25</shipDate>
<billingAddress>
<name>Ashok Malhotra</name>
<street>123 Microsoft Ave.</street>
<city>Hawthorne</city>
<state>NY</state>
<zip>10532-0000</zip>
</billingAddress>
<voice>555-1234</voice>
<fax>555-4321</fax>
</invoice>
|
<memo importance='high'
date='1999-03-23'>
<from>Paul V. Biron</from>
<to>Ashok Malhotra</to>
<subject>Latest draft</subject>
<body>
We need to discuss the latest
draft <emph>immediately</emph>.
Either email me at <email>
mailto:paul.v.biron@kp.org</email>
or call <phone>555-9876</phone>
</body>
</memo>
|
The invoice contains several dates and telephone numbers, the postal
abbreviation for a state
(which comes from an enumerated list of sanctioned values), and a ZIP code
(which takes a definable regular form). The memo contains many
of the same types of information: a date, telephone number, email address
and an "importance" value (from an enumerated
list, such as "low", "medium" or "high"). Applications which process
invoices and memos need to raise exceptions if something that was
supposed to be a date or telephone number does not conform to the rules
for valid dates or telephone numbers.
In both cases, validity constraints exist on the content of the
instances that are not expressible in XML DTDs. The limited datatyping
facilities in XML have prevented validating XML processors from supplying
the rigorous type checking required in these situations. The result
has been that individual applications writers have had to implement type
checking in an ad hoc manner. This specification addresses
the need of both document authors and applications writers for a robust,
extensible datatype system for XML which could be incorporated into
XML processors. As discussed below, these datatypes could be used in other
XML-related standards as well.
1.2 Requirements
The [XML Schema Requirements] document spells out
concrete requirements to be fulfilled by this specification,
which state that the XML Schema Language must:
-
provide for primitive data typing, including byte, date,
integer, sequence, SQL and Java primitive datatypes, etc.;
-
define a type system that is adequate for import/export
from database systems (e.g., relational, object, OLAP);
-
distinguish requirements relating to lexical data representation
vs. those governing an underlying information set;
-
allow creation of user-defined datatypes, such as
datatypes that are derived from existing datatypes and which
may constrain certain of its properties (e.g., range,
precision, length, format).
1.3 Scope
This portion of the XML Schema Language discusses datatypes that can be
used in an XML Schema. These datatypes can be specified for element
content that would be specified as
#PCDATA and attribute
values of various
types in a DTD. It is the intention of this specification
that it be usable outside of the context of XML Schemas for a wide range
of other XML-related activities such as [XSL] and
[RDF Schema].
1.4 Terminology
The terminology used to describe XML Schema Datatypes is defined in the
body of this specification. The terms defined in the following list are
used in building those definitions and in describing the actions of a
datatype processor:
-
[Definition:]
for compatibility
-
A feature of this specification included solely to ensure that schemas
which use this feature remain compatible with [XML 1.0 (Second Edition)]
-
[Definition:] may
-
Conforming documents and processors are permitted to but need
not behave as described.
-
[Definition:] match
-
(Of strings or names:) Two strings or names being compared must be
identical. Characters with multiple possible representations in ISO/IEC 10646 (e.g.
characters with both precomposed and base+diacritic forms) match only if they have
the same representation in both strings. No case folding is performed. (Of strings and
rules in the grammar:) A string matches a grammatical production if it belongs to the
language generated by that production.
-
[Definition:] must
-
Conforming documents and processors are required to behave as
described; otherwise they are in ·error·.
-
[Definition:] error
-
A violation of the rules of this specification; results are undefined.
Conforming software ·may· detect and report an
error and ·may· recover from it.
2 Type System
This section describes the conceptual framework behind the type system
defined in this specification. The framework has been influenced by the
[ISO 11404] standard on language-independent datatypes as
well as the datatypes for [SQL] and for programming
languages such as Java.
The datatypes discussed in this specification are computer
representations of well known abstract concepts such as
integer and date. It is not the place of this
specification to define these abstract concepts; many other publications
provide excellent definitions.
2.1 Datatype
[Definition:] In this specification,
a datatype is a 3-tuple, consisting of
a) a set of distinct values, called its ·value space·,
b) a set of lexical representations, called its
·lexical space·, and c) a set of ·facet·s
that characterize properties of the ·value space·,
individual values or lexical items.
2.2 Value space
[Definition:] A value
space is the set of values for a given datatype.
Each value in the value space of a datatype is denoted by
one or more literals in its ·lexical space·.
The ·value space· of a given datatype can
be defined in one of the following ways:
-
defined axiomatically from fundamental notions (intensional definition)
[see ·primitive·]
-
enumerated outright (extensional definition)
[see ·enumeration·]
-
defined by restricting the ·value space· of
an already defined datatype to a particular subset with a given set
of properties [see ·derived·]
-
defined as a combination of values from one or more already defined
·value space·(s) by a specific construction procedure
[see ·list· and ·union·]
·value space·s have certain properties. For example,
they always have the property of ·cardinality·,
some definition of equality
and might be ·ordered·, by which individual
values within the ·value space· can be compared to
one another. The properties of ·value space·s that
are recognized by this specification are defined in
2.4.1 Fundamental facets.
2.3 Lexical space
In addition to its ·value space·, each datatype also
has a lexical space.
[Definition:] A
lexical space is the set of valid literals
for a datatype.
For example, "100" and "1.0E2" are two different literals from the
·lexical space· of float which both
denote the same value. The type system defined in this specification
provides a mechanism for schema designers to control the set of values
and the corresponding set of acceptable literals of those values for
a datatype.
Note:
The literals in the ·lexical space·s defined in this specification
have the following characteristics:
-
Interoperability:
-
The number of literals for each value has been kept small; for many
datatypes there is a one-to-one mapping between literals and values.
This makes it easy to exchange the values between different systems.
In many cases, conversion from locale-dependent representations will
be required on both the originator and the recipient side, both for
computer processing and for interaction with humans.
-
Basic readability:
-
Textual, rather than binary, literals are used.
This makes hand editing, debugging, and similar activities possible.
-
Ease of parsing and serializing:
-
Where possible, literals correspond to those found in common
programming languages and libraries.
2.3.1 Canonical Lexical Representation
While the datatypes defined in this specification have, for the most part,
a single lexical representation i.e. each value in the datatype's
·value space· is denoted by a single literal in its
·lexical space·, this is not always the case. The
example in the previous section showed two literals for the datatype
float which denote the same value. Similarly, there
·may· be
several literals for one of the date or time datatypes that denote the
same value using different timezone indicators.
[Definition:] A canonical lexical representation
is a set of literals from among the valid set of literals
for a datatype such that there is a one-to-one mapping between literals
in the canonical lexical representation and
values in the ·value space·.
2.4 Facets
[Definition:] A facet is a single
defining aspect of a ·value space·. Generally
speaking, each facet characterizes a ·value space·
along independent axes or dimensions.
The facets of a datatype serve to distinguish those aspects of
one datatype which differ from other datatypes.
Rather than being defined solely in terms of a prose description
the datatypes in this specification are defined in terms of
the synthesis of facet values which together determine the
·value space· and properties of the datatype.
Facets are of two types: fundamental facets that define
the datatype and non-fundamental or constraining
facets that constrain the permitted values of a datatype.
2.5 Datatype dichotomies
It is useful to categorize the datatypes defined in this specification
along various dimensions, forming a set of characterization dichotomies.
2.5.1 Atomic vs. list vs. union datatypes
The first distinction to be made is that between
·atomic·, ·list· and ·union·
datatypes.
For example, a single token which ·match·es
Nmtoken from
[XML 1.0 (Second Edition)] could be the value of an ·atomic·
datatype (NMTOKEN); while a sequence of such tokens
could be the value of a ·list· datatype
(NMTOKENS).
2.5.1.1 Atomic datatypes
·atomic· datatypes can be either
·primitive· or ·derived·. The
·value space· of an ·atomic· datatype
is a set of "atomic" values, which for the purposes of this specification,
are not further decomposable. The ·lexical space· of
an ·atomic· datatype is a set of literals
whose internal structure is specific to the datatype in question.
2.5.1.2 List datatypes
Several type systems (such as the one described in
[ISO 11404]) treat ·list· datatypes as
special cases of the more general notions of aggregate or collection
datatypes.
·list· datatypes are always ·derived·.
The ·value space· of a ·list·
datatype is a set of finite-length sequences of ·atomic·
values. The ·lexical space· of a
·list· datatype is a set of literals whose internal
structure is a space-separated
sequence of literals of the
·atomic· datatype of the items in the
·list·.
[Definition:]
The ·atomic· or ·union·
datatype that participates in the definition of a ·list· datatype
is known as the itemType of that ·list· datatype.
<simpleType name='sizes'>
<list itemType='decimal'/>
</simpleType>
<cerealSizes xsi:type='sizes'> 8 10.5 12 </cerealSizes>
A ·list· datatype can be ·derived·
from an ·atomic· datatype whose
·lexical space· allows space
(such as string
or anyURI)or a
·union· datatype any of whose {member type definitions}'s
·lexical space· allows space.
In such a case, regardless of the input, list items
will be separated at space boundaries.
<simpleType name='listOfString'>
<list itemType='string'/>
</simpleType>
<someElement xsi:type='listOfString'>
this is not list item 1
this is not list item 2
this is not list item 3
</someElement>
When a datatype is ·derived· from a
·list· datatype, the following
·constraining facet·s apply:
For each of ·length·, ·maxLength·
and ·minLength·, the unit of length is
measured in number of list items. The value of ·whiteSpace·
is fixed to the value collapse.
For ·list· datatypes the ·lexical space·
is composed of space-separated
literals of its ·itemType·. Hence, any
·pattern· specified when a new datatype is
·derived· from a ·list· datatype is matched against
each literal of the ·list· datatype and
not against the literals of the datatype that serves as its
·itemType·.
<xs:simpleType name='myList'>
<xs:list itemType='xs:integer'/>
</xs:simpleType>
<xs:simpleType name='myRestrictedList'>
<xs:restriction base='myList'>
<xs:pattern value='123 (\d+\s)*456'/>
</xs:restriction>
</xs:simpleType>
<someElement xsi:type='myRestrictedList'>123 456</someElement>
<someElement xsi:type='myRestrictedList'>123 987 456</someElement>
<someElement xsi:type='myRestrictedList'>123 987 567 456</someElement>
The canonical-lexical-representation for the
·list· datatype is defined as the lexical form in which
each item in the ·list· has the canonical lexical
representation of its ·itemType·.
2.5.1.3 Union datatypes
The ·value space· and ·lexical space·
of a ·union· datatype are the union of the
·value space·s and ·lexical space·s of
its ·memberTypes·.
·union· datatypes are always ·derived·.
Currently, there are no ·built-in· ·union·
datatypes.
A prototypical example of a ·union· type is the
maxOccurs attribute on the
element element
in XML Schema itself: it is a union of nonNegativeInteger
and an enumeration with the single member, the string "unbounded", as shown below.
<attributeGroup name="occurs">
<attribute name="minOccurs" type="nonNegativeInteger"
use="optional" default="1"/>
<attribute name="maxOccurs"use="optional" default="1">
<simpleType>
<union>
<simpleType>
<restriction base='nonNegativeInteger'/>
</simpleType>
<simpleType>
<restriction base='string'>
<enumeration value='unbounded'/>
</restriction>
</simpleType>
</union>
</simpleType>
</attribute>
</attributeGroup>
Any number (greater than 1) of ·atomic· or ·list·
·datatype·s can participate in a ·union· type.
[Definition:]
The datatypes that participate in the
definition of a ·union· datatype are known as the
memberTypes of that ·union· datatype.
The order in which the ·memberTypes· are specified in the
definition (that is, the order of the <simpleType> children of the <union>
element, or the order of the QNames in the memberTypes
attribute) is significant.
During validation, an element or attribute's value is validated against the
·memberTypes· in the order in which they appear in the
definition until a match is found. The evaluation order can be overridden
with the use of xsi:type.
For example, given the definition below, the first instance of the <size> element
validates correctly as an 3.3.13 integer, the second and third as
3.2.1 string.
<xsd:element name='size'>
<xsd:simpleType>
<xsd:union>
<xsd:simpleType>
<xsd:restriction base='integer'/>
</xsd:simpleType>
<xsd:simpleType>
<xsd:restriction base='string'/>
</xsd:simpleType>
</xsd:union>
</xsd:simpleType>
</xsd:element>
<size>1</size>
<size>large</size>
<size xsi:type='xsd:string'>1</size>
The canonical-lexical-representation for a
·union· datatype is defined as the lexical form in which
the values have the canonical lexical representation
of the appropriate ·memberTypes·. Note:
A datatype which is ·atomic· in this specification
need not be an "atomic" datatype in any programming language used to
implement this specification. Likewise, a datatype which is a
·list· in this specification need not be a "list"
datatype in any programming language used to implement this specification.
Furthermore, a datatype which is a ·union· in this
specification need not be a "union" datatype in any programming
language used to implement this specification.
2.5.3 Built-in vs. user-derived datatypes
Conceptually there is no difference between the
·built-in· ·derived· datatypes
included in this specification and the ·user-derived·
datatypes which will be created by individual schema designers.
The ·built-in· ·derived· datatypes
are those which are believed to be so common that if they were not
defined in this specification many schema designers would end up
"reinventing" them. Furthermore, including these
·derived· datatypes in this specification serves to
demonstrate the mechanics and utility of the datatype generation
facilities of this specification.
Note:
A datatype which is ·built-in· in this specification
need not be a "built-in" datatype in any programming language used
to implement this specification. Likewise, a datatype which is
·user-derived· in this specification need not
be a "user-derived" datatype in any programming language used to
implement this specification.
3 Built-in datatypes
Each built-in datatype in this specification (both
·primitive· and
·derived·) can be uniquely addressed via a
URI Reference constructed as follows:
- the base URI is the URI of the XML Schema namespace
- the fragment identifier is the name of the datatype
For example, to address the int datatype, the URI is:
http://www.w3.org/2001/XMLSchema#int
Additionally, each facet definition element can be uniquely
addressed via a URI constructed as follows:
- the base URI is the URI of the XML Schema namespace
- the fragment identifier is the name of the facet
For example, to address the maxInclusive facet, the URI is:
http://www.w3.org/2001/XMLSchema#maxInclusive
Additionally, each facet usage in a built-in datatype definition
can be uniquely addressed via a URI constructed as follows:
- the base URI is the URI of the XML Schema namespace
- the fragment identifier is the name of the datatype, followed
by a period (".") followed by the name of the facet
For example, to address the usage of the maxInclusive facet in
the definition of int, the URI is:
http://www.w3.org/2001/XMLSchema#int.maxInclusive
3.1 Namespace considerations
The ·built-in· datatypes defined by this specification
are designed to be used with the XML Schema definition language as well as other
XML specifications.
To facilitate usage within the XML Schema definition language, the ·built-in·
datatypes in this specification have the namespace name:
- http://www.w3.org/2001/XMLSchema
To facilitate usage in specifications other than the XML Schema definition language,
such as those that do not want to know anything about aspects of the
XML Schema definition language other than the datatypes, each ·built-in·
datatype is also defined in the namespace whose URI is:
- http://www.w3.org/2001/XMLSchema-datatypes
This applies to both
·built-in· ·primitive· and
·built-in· ·derived· datatypes.
Each ·user-derived· datatype is also associated with a
unique namespace. However, ·user-derived· datatypes
do not come from the namespace defined by this specification; rather,
they come from the namespace of the schema in which they are defined
(see XML Representation of
Schemas in [XML Schema Part 1: Structures]).
3.2 Primitive datatypes
The ·primitive· datatypes defined by this specification
are described below. For each datatype, the
·value space· and ·lexical space·
are defined, ·constraining facet·s which apply
to the datatype are listed and any datatypes ·derived·
from this datatype are specified.
·primitive· datatypes can only be added by revisions
to this specification.
3.2.2 boolean
[Definition:] boolean has the
·value space· required to support the mathematical
concept of binary-valued logic: {true, false}.
3.2.2.1 Lexical representation
An instance of a datatype that is defined as ·boolean·
can have the following legal literals {true, false, 1, 0}.
3.2.2.2 Canonical representation
The canonical representation for boolean is the set of
literals {true, false}.
3.2.3 decimal
[Definition:] decimal
represents a subset of the real numbers, which can be represented by decimal numerals.
The ·value space· of decimal
is the set of
numbers that can be obtained by multiplying an integer by a non-positive
power of ten, i.e., expressible as i × 10^-n
where i and n are integers
and n >= 0.
Precision is not reflected in this value space;
the number 2.0 is not distinct from the number 2.00.
The ·order-relation· on decimal
is the order relation on real numbers, restricted
to this subset.
Note:
All ·minimally conforming· processors ·must·
support decimal numbers with a minimum of 18 decimal digits (i.e., with a
·totalDigits· of 18). However,
·minimally conforming· processors ·may·
set an application-defined limit on the maximum number of decimal digits
they are prepared to support, in which case that application-defined
maximum number ·must· be clearly documented.
3.2.3.1 Lexical representation
decimal has a lexical representation
consisting of a finite-length sequence of decimal digits (#x30-#x39) separated
by a period as a decimal indicator.
An optional leading sign is allowed.
If the sign is omitted, "+" is assumed. Leading and trailing zeroes are optional.
If the fractional part is zero, the period and following zero(es) can
be omitted.
For example: -1.23, 12678967.543233, +100000.00, 210.
3.2.3.2 Canonical representation
The canonical representation for decimal is defined by
prohibiting certain options from the
3.2.3.1 Lexical representation. Specifically, the preceding
optional "+" sign is prohibited. The decimal point is required. Leading and
trailing zeroes are prohibited subject to the following: there must be at least
one digit to the right and to the left of the decimal point which may be a zero.
3.2.4 float
[Definition:] float
is patterned after the IEEE single-precision 32-bit floating point type
[IEEE 754-1985]. The basic ·value space· of
float consists of the values
m × 2^e, where m
is an integer whose absolute value is less than
2^24, and e is an integer
between -149 and 104, inclusive. In addition to the basic
·value space· described above, the
·value space· of float also contains the
following
three
special values:
positive and negative infinity and not-a-number
(NaN).
The ·order-relation· on float
is: x < y iff y - x is positive
for x and y in the value space.
Positive infinity is greater than all other non-NaN values.
NaN equals itself but is ·incomparable· with (neither greater than nor less than)
any other value in the ·value space·.
Note:
"Equality" in this Recommendation is defined to be "identity" (i.e., values that
are identical in the ·value space· are equal and vice versa).
Identity must be used for the few operations that are defined in this Recommendation.
Applications using any of the datatypes defined in this Recommendation may use different
definitions of equality for computational purposes; [IEEE 754-1985]-based computation systems
are examples. Nothing in this Recommendation should be construed as requiring that
such applications use identity as their equality relationship when computing.
Any value ·incomparable· with the value used for the four bounding facets
( ·minInclusive·, ·maxInclusive·,
·minExclusive·, and ·maxExclusive·) will be
excluded from the resulting restricted ·value space·. In particular,
when "NaN" is used as a facet value for a bounding facet, since no other
float values are ·comparable· with it, the result is a ·value space·
either having NaN as its only member (the inclusive cases) or that is empty
(the exclusive cases). If any other value is used for a bounding facet,
NaN will be excluded from the resulting restricted ·value space·;
to add NaN back in requires union with the NaN-only space.
This datatype differs from that of [IEEE 754-1985] in that there is only one
NaN and only one zero. This makes the equality and ordering of values in the data
space differ from that of [IEEE 754-1985] only in that for schema purposes NaN = NaN.
A literal in the ·lexical space· representing a
decimal number d maps to the normalized value
in the ·value space· of float that is
closest to d in the sense defined by
[Clinger, WD (1990)]; if d is
exactly halfway between two such values then the even value is chosen.
3.2.4.1 Lexical representation
float values have a lexical representation
consisting of a mantissa followed, optionally, by the character
"E" or "e", followed by an exponent. The exponent ·must·
be an integer. The mantissa must be a decimal number. The representations
for exponent and mantissa must follow the lexical rules for
integer and decimal. If the "E" or "e" and
the following exponent are omitted, an exponent value of 0 is assumed.
The special values
positive
and negative infinity and not-a-number have lexical representations
INF, -INF and
NaN, respectively.
Lexical representations for zero may take a positive or negative sign.
For example, -1E4, 1267.43233E12, 12.78e-2, 12
, -0, 0
and INF are all legal literals for float.
3.2.4.2 Canonical representation
The canonical representation for float is defined by
prohibiting certain options from the
3.2.4.1 Lexical representation. Specifically, the exponent
must be indicated by "E". Leading zeroes and the preceding optional "+" sign
are prohibited in the exponent.
If the exponent is zero, it must be indicated by "E0".
For the mantissa, the preceding optional "+" sign is prohibited
and the decimal point is required.
Leading and trailing zeroes are prohibited subject to the following:
number representations must
be normalized such that there is a single digit
which is non-zero
to the left of the decimal point and at least a single digit to the
right of the decimal point
unless the value being represented is zero. The canonical
representation for zero is 0.0E0.
3.2.5 double
[Definition:] The double
datatype
is patterned after the
IEEE double-precision 64-bit floating point
type [IEEE 754-1985]. The basic ·value space·
of double consists of the values
m × 2^e, where m
is an integer whose absolute value is less than
2^53, and e is an
integer between -1075 and 970, inclusive. In addition to the basic
·value space· described above, the
·value space· of double also contains
the following
three
special values:
positive and negative infinity and not-a-number
(NaN).
The ·order-relation· on double
is: x < y iff y - x is positive
for x and y in the value space.
Positive infinity is greater than all other non-NaN values.
NaN equals itself but is ·incomparable· with (neither greater than nor less than)
any other value in the ·value space·.
Note:
"Equality" in this Recommendation is defined to be "identity" (i.e., values that
are identical in the ·value space· are equal and vice versa).
Identity must be used for the few operations that are defined in this Recommendation.
Applications using any of the datatypes defined in this Recommendation may use different
definitions of equality for computational purposes; [IEEE 754-1985]-based computation systems
are examples. Nothing in this Recommendation should be construed as requiring that
such applications use identity as their equality relationship when computing.
Any value ·incomparable· with the value used for the four bounding facets
( ·minInclusive·, ·maxInclusive·,
·minExclusive·, and ·maxExclusive·) will be
excluded from the resulting restricted ·value space·. In particular,
when "NaN" is used as a facet value for a bounding facet, since no other
double values are ·comparable· with it, the result is a ·value space·
either having NaN as its only member (the inclusive cases) or that is empty
(the exclusive cases). If any other value is used for a bounding facet,
NaN will be excluded from the resulting restricted ·value space·;
to add NaN back in requires union with the NaN-only space.
This datatype differs from that of [IEEE 754-1985] in that there is only one
NaN and only one zero. This makes the equality and ordering of values in the data
space differ from that of [IEEE 754-1985] only in that for schema purposes NaN = NaN.
A literal in the ·lexical space· representing a
decimal number d maps to the normalized value
in the ·value space· of double that is
closest to d; if d is
exactly halfway between two such values then the even value is chosen.
This is the best approximation of d
([Clinger, WD (1990)], [Gay, DM (1990)]), which is more
accurate than the mapping required by [IEEE 754-1985].
3.2.5.1 Lexical representation
double values have a lexical representation
consisting of a mantissa followed, optionally, by the character "E" or
"e", followed by an exponent. The exponent ·must· be
an integer. The mantissa must be a decimal number. The representations
for exponent and mantissa must follow the lexical rules for
integer and decimal. If the "E" or "e"
and the following exponent are omitted, an exponent value of 0 is assumed.
The special values
positive
and negative infinity and not-a-number have lexical representations
INF, -INF and
NaN, respectively.
Lexical representations for zero may take a positive or negative sign.
For example, -1E4, 1267.43233E12, 12.78e-2, 12
, -0, 0
and INF
are all legal literals for double.
3.2.5.2 Canonical representation
The canonical representation for double is defined by
prohibiting certain options from the
3.2.5.1 Lexical representation. Specifically, the exponent
must be indicated by "E". Leading zeroes and the preceding optional "+" sign
are prohibited in the exponent.
If the exponent is zero, it must be indicated by "E0".
For the mantissa, the preceding optional "+" sign is prohibited
and the decimal point is required.
Leading and trailing zeroes are prohibited subject to the following:
number representations must
be normalized such that there is a single digit
which is non-zero
to the left of the decimal point and at least a single digit to the
right of the decimal point
unless the value being represented is zero. The canonical
representation for zero is 0.0E0.
3.2.6 duration
[Definition:]
duration represents a duration of time.
The ·value space· of duration is
a six-dimensional space where the coordinates
designate the Gregorian year, month, day, hour, minute, and second components defined in
§ 5.5.3.2 of [ISO 8601],
respectively. These components are ordered
in their significance by their order of appearance i.e. as year, month, day,
hour, minute, and second.
Note:
All ·minimally conforming· processors ·must·
support year values with a minimum of 4 digits (i.e., YYYY) and a minimum fractional second precision of milliseconds or three decimal digits (i.e. s.sss). However,
·minimally conforming· processors ·may·
set an application-defined limit on the maximum number of digits
they are prepared to support in these two cases, in which case that application-defined
maximum number ·must· be clearly documented.
3.2.6.1 Lexical representation
The lexical representation for duration is the
[ISO 8601] extended format PnYn
MnDTnH nMnS, where
nY represents the number of years, nM the
number of months, nD the number of days, 'T' is the
date/time separator, nH the number of hours,
nM the number of minutes and nS the
number of seconds. The number of seconds can include decimal digits
to arbitrary precision.
The values of the
Year, Month, Day, Hour and Minutes components are not restricted but
allow an arbitrary
unsigned integer, i.e., an integer that
conforms to the pattern [0-9]+..
Similarly, the value of the Seconds component
allows an arbitrary unsigned decimal.
Following [ISO 8601], at least one digit must
follow the decimal point if it appears. That is, the value of the Seconds component
must conform to the pattern [0-9]+(\.[0-9]+)?.
Thus, the lexical representation of
duration does not follow the alternative
format of § 5.5.3.2.1 of [ISO 8601].
An optional preceding minus sign ('-') is
allowed, to indicate a negative duration. If the sign is omitted a
positive duration is indicated. See also D ISO 8601 Date and Time Formats.
For example, to indicate a duration of 1 year, 2 months, 3 days, 10
hours, and 30 minutes, one would write: P1Y2M3DT10H30M.
One could also indicate a duration of minus 120 days as:
-P120D.
Reduced precision and truncated representations of this format are allowed
provided they conform to the following:
-
If the number of years, months, days, hours, minutes, or seconds in any
expression equals zero, the number and its corresponding designator ·may·
be omitted. However, at least one number and its designator ·must·
be present.
-
The seconds part ·may· have a decimal fraction.
-
The designator 'T' must
be absent if and only if all of the time items are absent.
The designator 'P' must always be present.
For example, P1347Y, P1347M and P1Y2MT2H are all allowed;
P0Y1347M and P0Y1347M0D are allowed. P-1347M is not allowed although
-P1347M is allowed. P1Y2MT is not allowed.
3.2.6.2 Order relation on duration
In general, the ·order-relation· on duration
is a partial order since there is no determinate relationship between certain
durations such as one month (P1M) and 30 days (P30D).
The ·order-relation·
of two duration values x and
y is x < y iff s+x < s+y
for each qualified dateTime s
in the list below. These values for s cause the greatest deviations in the addition of
dateTimes and durations. Addition of durations to time instants is defined
in E Adding durations to dateTimes.
- 1696-09-01T00:00:00Z
- 1697-02-01T00:00:00Z
- 1903-03-01T00:00:00Z
- 1903-07-01T00:00:00Z
The following table shows the strongest relationship that can be determined
between example durations. The symbol <> means that the order relation is
indeterminate. Note that because of leap-seconds, a seconds field can vary
from 59 to 60. However, because of the way that addition is defined in
E Adding durations to dateTimes, they are still totally ordered.
| | Relation |
|---|
| P1Y | > P364D | <> P365D | | <> P366D | < P367D | | P1M | > P27D | <> P28D | <> P29D | <> P30D | <> P31D | < P32D | | P5M | > P149D | <> P150D | <> P151D | <> P152D | <> P153D | < P154D |
Implementations are free to optimize the computation of the ordering relationship. For example, the following table can be used to
compare durations of a small number of months against days.
| | Months | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | ... |
|---|
| Days | Minimum | 28 | 59 | 89 | 120 | 150 | 181 | 212 | 242 | 273 | 303 | 334 | 365 | 393 | ... |
|---|
| Maximum | 31 | 62 | 92 | 123 | 153 | 184 | 215 | 245 | 276 | 306 | 337 | 366 | 397 | ... |
|---|
3.2.6.4 Totally ordered durations
Certain derived datatypes of durations can be guaranteed have a total order. For
this, they must have fields from only one row in the list below and the time zone
must either be required or prohibited.
- year, month
- day, hour, minute, second
For example, a datatype could be defined to correspond to the
[SQL] datatype Year-Month interval that required a four digit
year field and a two digit month field but required all other fields to be unspecified. This datatype could be defined as below and would have a total order.
<simpleType name='SQL-Year-Month-Interval'>
<restriction base='duration'>
<pattern value='P\p{Nd}{4}Y\p{Nd}{2}M'/>
</restriction>
</simpleType>
3.2.7 dateTime
[Definition:]
dateTime values may be viewed as objects with integer-valued
year, month, day, hour and minute properties, a decimal-valued second property,
and a boolean timezoned property.
Each such object also has one decimal-valued
method or computed property, timeOnTimeline, whose value is always a decimal
number; the values are dimensioned in seconds, the integer 0 is
0001-01-01T00:00:00 and the value of timeOnTimeline for other dateTime
values is computed using the Gregorian algorithm as modified for leap-seconds.
The timeOnTimeline values form two related "timelines", one for timezoned
values and one for non-timezoned values.
Each timeline is a copy of the ·value space· of decimal,
with integers given units of seconds.
The ·value space· of
dateTime is closely related to the dates and times described in ISO 8601.
For clarity, the text above specifies a particular origin point for the
timeline.
It should be noted, however, that schema processors need not expose the
timeOnTimeline value to schema users, and there is no requirement that a
timeline-based implementation use the particular origin described here in
its internal representation.
Other interpretations of the ·value space· which lead to the
same results (i.e., are isomorphic) are of course acceptable.
All timezoned times are Coordinated Universal Time (UTC, sometimes called
"Greenwich Mean Time"). Other timezones indicated in lexical representations
are converted to UTC during conversion of literals to values.
"Local" or untimezoned times are presumed to be the time in the timezone of some
unspecified locality as prescribed by the appropriate legal authority;
currently there are no legally prescribed timezones which are durations
whose magnitude is greater than 14 hours. The value of each numeric-valued property
(other than timeOnTimeline) is limited to the maximum value within the interval
determined by the next-higher property. For example, the day value can never be 32,
and cannot even be 29 for month 02 and year 2002 (February 2002).
Note: The date and time datatypes described in this recommendation were inspired
by [ISO 8601]. '0001' is the lexical representation of the year 1 of the Common Era
(1
CE, sometimes written "AD 1" or "1 AD"). There is no year 0, and '0000' is not a valid lexical representation. '-0001' is the lexical representation of the year 1 Before
Common Era (1 BCE, sometimes written "1 BC"). Those using this (1.0) version of this Recommendation to
represent negative years should be aware that the interpretation of lexical
representations beginning with a '-' is likely to change in
subsequent versions.
[ISO 8601]
makes no mention of the year 0; in [ISO 8601:1998 Draft Revision]
the form '0000' was disallowed and this recommendation disallows it as well.
However, [ISO 8601:2000 Second Edition], which became available just as we were completing version
1.0, allows the form '0000', representing the year 1 BCE. A number of external commentators
have also suggested that '0000' be
allowed, as the lexical representation for 1 BCE, which is the normal usage in
astronomical contexts.
It is the intention of the XML Schema
Working Group to allow '0000' as a lexical representation in the
dateTime, date, gYear, and
gYearMonth datatypes in a subsequent version
of this Recommendation. '0000' will be the lexical representation of 1
BCE (which is a leap year), '-0001' will become the lexical representation of 2
BCE (not 1 BCE as in this (1.0) version), '-0002' of 3 BCE, etc.
Note: See the conformance note in [] which
applies to this datatype as well.
3.2.7.1 Lexical representation
The ·lexical space· of dateTime consists of
finite-length sequences of characters of the form:
'-'? yyyy '-' mm '-' dd 'T' hh ':' mm ':' ss ('.' s+)? (zzzzzz)?,
where
- '-'? yyyy is a four-or-more digit optionally negative-signed
numeral that represents the year; if more than four digits, leading zeros
are prohibited, and '0000' is prohibited (see the Note above []; also note that a plus sign is not permitted);
- the remaining '-'s are separators between parts of the date portion;
- the first mm is a two-digit numeral that represents the month;
- dd is a two-digit numeral that represents the day;
- 'T' is a separator indicating that time-of-day follows;
- hh is a two-digit numeral that represents the hour; '24' is permitted if the
minutes and seconds represented are zero, and the dateTime value so
represented is the first instant of the following day (the hour property of a
dateTime object in the ·value space· cannot have
a value greater than 23);
- ':' is a separator between parts of the time-of-day portion;
- the second mm is a two-digit numeral that represents the minute;
- ss is a two-integer-digit numeral that represents the
whole seconds;
- '.' s+ (if present) represents the
fractional seconds;
- zzzzzz (if present) represents the timezone (as described below).
For example, 2002-10-10T12:00:00-05:00 (noon on 10 October 2002, Central Daylight
Savings Time as well as Eastern Standard Time in the U.S.) is 2002-10-10T17:00:00Z,
five hours later than 2002-10-10T12:00:00Z.
For further guidance on arithmetic with dateTimes and durations,
see E Adding durations to dateTimes.
3.2.7.2 Canonical representation
Except for trailing fractional zero digits in the seconds representation,
'24:00:00' time representations, and timezone (for timezoned values), the mapping
from literals to values is one-to-one. Where there is more than
one possible representation, the canonical representation is as follows:
- The 2-digit numeral representing the hour must not be '
24'; - The fractional second string, if present, must not end in '
0'; - for timezoned values, the timezone must be
represented with '
Z'
(All timezoned dateTime values are UTC.).
3.2.7.3 Timezones
Timezones are durations with (integer-valued) hour and minute properties
(with the hour magnitude limited to at most 14, and the minute magnitude
limited to at most 59, except that if the hour magnitude is 14, the minute
value must be 0); they may be both positive or both negative.
The lexical representation of a timezone is a string of the form:
(('+' | '-') hh ':' mm) | 'Z',
where - hh is a two-digit numeral (with leading zeros as required) that
represents the hours,
- mm is a two-digit numeral that represents the minutes,
- '+' indicates a nonnegative duration,
- '-' indicates a nonpositive duration.
The mapping so defined is one-to-one, except that '+00:00', '-00:00', and 'Z'
all represent the same zero-length duration timezone, UTC; 'Z' is its canonical
representation.
When a timezone is added to a UTC dateTime, the result is the date
and time "in that timezone". For example, 2002-10-10T12:00:00+05:00 is
2002-10-10T07:00:00Z and 2002-10-10T00:00:00+05:00 is 2002-10-09T19:00:00Z.
3.2.7.4 Order relation on dateTime
dateTime value objects on either timeline are totally ordered by their timeOnTimeline
values; between the two timelines, dateTime value objects are ordered by their
timeOnTimeline values when their timeOnTimeline values differ by more than
fourteen hours, with those whose difference is a duration of 14 hours or less
being ·incomparable·.
In general, the ·order-relation· on dateTime
is a partial order since there is no determinate relationship between certain
instants. For example, there is no determinate
ordering between
(a)
2000-01-20T12:00:00 and (b) 2000-01-20T12:00:00Z. Based on
timezones currently in use, (c) could vary from 2000-01-20T12:00:00+12:00 to
2000-01-20T12:00:00-13:00. It is, however, possible for this range to expand or
contract in the future, based on local laws. Because of this, the following
definition uses a somewhat broader range of indeterminate values: +14:00..-14:00. The following definition uses the notation S[year] to represent the year
field of S, S[month] to represent the month field, and so on. The notation (Q
& "-14:00") means adding the timezone -14:00 to Q, where Q did not
already have a timezone. This is a logical explanation of the process. Actual
implementations are free to optimize as long as they produce the same results.
The ordering between two dateTimes P and Q is defined by the following
algorithm:
A.Normalize P and Q. That is, if there is a timezone present, but
it is not Z, convert it to Z using the addition operation defined in
E Adding durations to dateTimes - Thus 2000-03-04T23:00:00+03:00 normalizes to 2000-03-04T20:00:00Z
B. If P and Q either both have a time zone or both do not have a time
zone, compare P and Q field by field from the year field down to the
second field, and return a result as soon as it can be determined. That is: - For each i in {year, month, day, hour, minute, second}
- If P[i] and Q[i] are both not specified, continue to the next i
- If P[i] is not specified and Q[i] is, or vice versa, stop and return
P <> Q
- If P[i] < Q[i], stop and return P < Q
- If P[i] > Q[i], stop and return P > Q
- Stop and return P = Q
C.Otherwise, if P contains a time zone and Q does not, compare
as follows:
- P < Q if P < (Q with time zone +14:00)
- P > Q if P > (Q with time zone -14:00)
- P <> Q otherwise, that is, if (Q with time zone +14:00) < P < (Q with time zone -14:00)
D. Otherwise, if P does not contain a time zone and Q does, compare
as follows: - P < Q if (P with time zone -14:00) < Q.
- P > Q if (P with time zone +14:00) > Q.
- P <> Q otherwise, that is, if (P with time zone +14:00) < Q < (P with time zone -14:00)
Examples: | Determinate | Indeterminate |
|---|
| 2000-01-15T00:00:00 < 2000-02-15T00:00:00 | 2000-01-01T12:00:00 <>
1999-12-31T23:00:00Z | | 2000-01-15T12:00:00 < 2000-01-16T12:00:00Z | 2000-01-16T12:00:00 <>
2000-01-16T12:00:00Z | | | 2000-01-16T00:00:00 <> 2000-01-16T12:00:00Z |
3.2.7.5 Totally ordered dateTimesCertain derived types from dateTime
can be guaranteed have a total order. To
do so, they must require that a specific set of fields are always specified, and
that remaining fields (if any) are always unspecified. For example, the date
datatype without time zone is defined to contain exactly year, month, and day.
Thus dates without time zone have a total order among themselves. |
|
|