[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] Parse a date - exslt:parse-date in Saxon 6


Subject: Re: [xsl] Parse a date - exslt:parse-date in Saxon 6
From: "Kerry, Richard richard.kerry@xxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Wed, 22 Oct 2014 16:35:02 -0000

Following a follow-up question about the if-condition-return idiom I managed
to get something working sufficient for my requirements.

As promised here's some XSL 1 code that can take a date expressed in any one
of a number of formats and write the result in a given format.  (XSL 1 because
it fits in a DocBook customization, which needs to be XSLT 1)
The formats available are quite a limited list, but sufficient for my
purposes.

As it's XSL 1 there's no regular expression usage.  The result is more verbose
than I'd expect to be able to achieve in a language I'm more familiar with (eg
C++).

The first template 'format.meta.date' calls the two templates for the two
general sets of formats supported.
'parse.date.1' handles dates with alphabetic months.
'parse.date.2' handles numeric months.
All dates are in UK/European format (d/m/y).  If you want US (m/d/y) it should
be straightforward enough to change.
The output is a date in one of epub3's required formats -  YYYY, YYYY-MM,
YYYY-MM-DD.
Two digit years are presumed to be in the past.  They are considered to be
21st century if lower than the current year mod 100, 20th century if higher.

The original had some xsl:message lines in to aid debugging - I may have
mis-edited them in in the course of editing for this message, apologies if so.


Regards,
Richard.


<!-- EPUB3 meta date should be of the form:  YYYY, YYYY-MM or YYYY-MM-DD -->
<xsl:template name="format.meta.date">
  <xsl:param name="string" select="''"/>
  <xsl:param name="node" select="."/>

  <!--
   A quick search has shown the following formats in use: 28 April 2009, 19
November 2003, 10 December 2003,
   16/05/2012, 10/06/2014, 22/7/2010, 12/8/2010, 31 Mar 2011, 09 Dec 2010, 04
Nov. 09, 29 Oct. 09, 14 Oct. 09, Feb 09

   Categorizing as follows (after normalize-space):
   "dd? mmm(m{0,6}).? yy(yy)?"
   "dd?/mm?/tt(yy)?"
   "mmm(m{0,6}).? yy(yy)?"
   Though XSLT 1 doesn't include regular expressions, so can't use it like
this.
  -->
  <xsl:variable name="normalized"    select="translate($string, '0123456789',
'##########')"/>
  <xsl:variable name="date.ok">
    <xsl:choose>
      <xsl:when test="string-length($string) = 4 and        $normalized =
'####'">1</xsl:when>
      <xsl:when test="string-length($string) = 7 and        $normalized =
'####-##'">1</xsl:when>
      <xsl:when test="string-length($string) = 10 and      $normalized =
'####-##-##'">1</xsl:when>
      <xsl:when test="string-length($string) = 10 and      $normalized =
'####-##-##'">1</xsl:when>
      <xsl:otherwise>0</xsl:otherwise>
    </xsl:choose>
  </xsl:variable>
 <!-- It isn't one of the permitted formats.   See if we can parse it as one
of our own formats. -->
 <xsl:variable name="string.1">
   <xsl:call-template name="parse.date.1" >  <xsl:with-param name="string"
select="$string" />   </xsl:call-template>
 </xsl:variable>
 <xsl:variable name="string.2">
   <xsl:call-template name="parse.date.2" >  <xsl:with-param name="string"
select="$string" />   </xsl:call-template>
 </xsl:variable>
 <xsl:variable name="new.string">
   <xsl:choose>
    <xsl:when test="string-length( $string.1 ) > 0" >    <xsl:value-of
select="$string.1"/>  </xsl:when>
    <xsl:when test="string-length( $string.2 ) > 0" >    <xsl:value-of
select="$string.2"/>  </xsl:when>
    <xsl:otherwise>
     <xsl:message>
      <xsl:text>WARNING: wrong metadata date format: '</xsl:text>
      <xsl:value-of select="$string"/>
      <xsl:text>' in element </xsl:text>
      <xsl:value-of select="local-name($node/..)"/>
      <xsl:text>/</xsl:text>
      <xsl:value-of select="local-name($node)"/>
      <xsl:text>. It must be in one of these forms: </xsl:text>
      <xsl:text>YYYY, YYYY-MM, YYYY-MM-DD,</xsl:text>
      <xsl:text>DD MMM(...) (YY)YY, DD/MM/(YY)YY.</xsl:text>
     </xsl:message>
     <xsl:value-of select="''" />
    </xsl:otherwise>
   </xsl:choose>
 </xsl:variable>
 <!-- return the string anyway -->
 <xsl:value-of select="$new.string"/>
</xsl:template>

<xsl:template name="parse.date.1">
  <xsl:param name="string" select="''"/>
  <!--   Parse the following formats.
   "dd? mmm(m{0,6}).? yy(yy)?"
   "mmm(m{0,6}).? yy(yy)?"
  -->
 <!--  Months have three (May) to nine (September) letters.  Optional dot. -->
 <xsl:variable name="normalized"  select="translate($string, '0123456789',
'##########')"/>
 <!-- normalize spaces. So "  Dec   96  ",  " 6  dec.   96  " etc all become
"Dec 96" or "6 dec. 96" -->
 <xsl:variable name="normalized2"
select="normalize-space($normalized)"/>
 <!-- force to lower case -->
 <xsl:variable name="normalized3"   select="translate($normalized2,
'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz' )"/>
 <!-- strip numerics. Giving "may ", " dec. " etc. -->
 <xsl:variable name="normalized4"   select="translate($normalized3, '#', ''
)"/>
 <!-- normalize spaces again. Giving "may", "dec." -->
 <xsl:variable name="month-raw"   select="normalize-space($normalized4)"/>
 <!-- remove trailing dot, if present. Giving "may", "sept" etc -->
 <xsl:variable name="month-dotless" >
   <xsl:choose>
    <xsl:when test="substring($month-raw, string-length($month-raw), 1) =
'.'">
      <xsl:value-of select="substring($month-raw, 1, string-length($month-raw)
- 1)" />
    </xsl:when>
    <xsl:otherwise>
      <xsl:value-of select="$month-raw" />
    </xsl:otherwise>
   </xsl:choose>
 </xsl:variable>

 <!-- categorize. alphabetics become '%' -->
 <xsl:variable name="normalized7" select="translate($month-dotless,
'abcdefghijklmnopqrstuvwxyz', '%%%%%%%%%%%%%%%%%%%%%%%%%%' )"/>
 <!-- By this point we have month names in isolation, without dots, length as
given.    So expecting '%' only, three to nine times. -->
 <xsl:variable name="normalized8"   select="translate($normalized7, '%', ''
)"/>
 <!-- cleared alphabetics, so expect nothing left. -->
 <xsl:variable name="date.ok.1">
   <xsl:choose>
    <xsl:when test="string-length($normalized8) = 0">true</xsl:when>
    <xsl:otherwise>false</xsl:otherwise>
   </xsl:choose>
 </xsl:variable>
<!-- <xsl:if test="$date.ok.1 = false">
   <xsl:message>
    <xsl:text>WARNING: unrecognized month (non-alphabetics): '</xsl:text>
    <xsl:value-of select="$month-dotless"/>
    <xsl:text>' in '</xsl:text>
    <xsl:value-of select="$string"/>
    <xsl:text>'.</xsl:text>
   </xsl:message>
 </xsl:if>-->
 <!-- check range of lengths. -->
 <xsl:variable name="date.ok.2">
   <xsl:choose>
    <xsl:when test="string-length($normalized7) >= 3 and
string-length($normalized7) &lt;= 9">1</xsl:when>
    <xsl:otherwise>0</xsl:otherwise>
   </xsl:choose>
 </xsl:variable>
 <!-- extract three letter prefix of month name.
   month-dotless has the month in lower case, whatever length it was given.
-->
 <xsl:variable name="normalized9" select="substring($month-dotless, 1, 3)" />
 <!-- check three letter version is valid.   Look it up in the reference set.
-->
 <xsl:variable
name="months">janfebmaraprmayjunjulaugsepoctnovdec</xsl:variable>
 <xsl:variable name="month-valid" select="contains($months, $normalized9)" />
 <!-- Now we're saying "ok we found it, but what was the index we found it at.
-->
 <xsl:variable name="month-before" select="substring-before($months,
$normalized9)" />
 <xsl:variable name="month-index" select="(string-length( $month-before ) div
3) +1" />
 <xsl:variable name="month-name">
   <xsl:choose>
    <xsl:when test="$month-index = 1">january</xsl:when>
    <xsl:when test="$month-index = 2">february</xsl:when>
    <xsl:when test="$month-index = 3">march</xsl:when>
    <xsl:when test="$month-index = 4">april</xsl:when>
    <xsl:when test="$month-index = 5">may</xsl:when>
    <xsl:when test="$month-index = 6">june</xsl:when>
    <xsl:when test="$month-index = 7">july</xsl:when>
    <xsl:when test="$month-index = 8">august</xsl:when>
    <xsl:when test="$month-index = 9">september</xsl:when>
    <xsl:when test="$month-index = 10">october</xsl:when>
    <xsl:when test="$month-index = 11">november</xsl:when>
    <xsl:when test="$month-index = 12">december</xsl:when>
   </xsl:choose>
 </xsl:variable>
 <!-- We now have the full name of the month.   Check that if a longer form
was given it matches the full name. -->
 <xsl:variable name="month-valid-full" select="$month-dotless = substring(
$month-name, 1, string-length( $month-dotless ) )" />
 <xsl:variable name="month-string-3" select="format-number( $month-index, '00'
)" />
 <!-- Now get the day and year. -->
 <!-- force to lower case -->
 <xsl:variable name="normalized10"    select="translate($string,
'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz' )"/>
 <xsl:variable name="day-string" select="substring-before( $normalized10,
$month-dotless )" />
 <xsl:variable name="day-string-2" select="normalize-space( $day-string )" />
 <xsl:variable name="day-num" select="number( $day-string-2 )" />
 <xsl:variable name="day-string-3" select="format-number( $day-num, '00' )" />
 <xsl:variable name="year-string" select="substring-after( $normalized10,
$month-raw )" />
 <xsl:variable name="year-string-2" select="normalize-space( $year-string )"
/>
 <xsl:variable name="this-year" select="date:year()" />
 <xsl:variable name="this-year-in-century" select="$this-year mod 100" />
 <xsl:variable name="year-num" select="number( $year-string-2 )" />
 <xsl:variable name="year-num-2">
   <xsl:choose>
    <xsl:when test="$year-num &lt; $this-year-in-century">
      <xsl:value-of select="$year-num + 2000" />
    </xsl:when>
    <xsl:when test="$year-num > $this-year-in-century and $year-num &lt; 100">
      <xsl:value-of select="$year-num + 1900" />
    </xsl:when>
    <xsl:otherwise>
      <xsl:value-of select="$year-num" />
    </xsl:otherwise>
   </xsl:choose>
 </xsl:variable>
 <xsl:variable name="year-string-3" select="format-number( $year-num-2, '0000'
)" />
 <!-- Return something. -->
 <xsl:variable name="return" >
   <xsl:if test="$date.ok.1 and $date.ok.2 and $month-valid and
$month-valid-full">
    <xsl:choose>
      <xsl:when test="string-length( $day-string ) > 0">
       <xsl:variable name="result">
         <xsl:value-of select="$year-string-3" />-<xsl:value-of
select="$month-string-3" />-<xsl:value-of select="$day-string-3" />
       </xsl:variable>
       <xsl:value-of select="$result" />
      </xsl:when>
      <xsl:when test="string-length( $day-string ) = 0">
       <xsl:variable name="result">
         <xsl:value-of select="$year-string-3" />-<xsl:value-of
select="$month-string-3" />
       </xsl:variable>
       <xsl:value-of select="$result" />
      </xsl:when>
    </xsl:choose>
   </xsl:if>
 </xsl:variable>
 <xsl:value-of select="$return" />
</xsl:template>

<xsl:template name="parse.date.2">
  <xsl:param name="string" select="''"/>
  <!--
   Parse the following formats.
   "dd?/mm?/tt(yy)?"
   ie.
   dd/mm/yyyy
   mm/yyyy
     (where dd may be d, mm may be m, yyyy may be yy)
  -->
 <!-- Turn numbers to # and remove all spaces. -->
 <xsl:variable name="normalized"  select="translate($string, '0123456789 ',
'##########')"/>
 <!-- should now be '#/#/##' '#/#/####' '#/##/##' '#/##/####' '##/#/##'
'##/#/####' '##/##/##' '##/##/####' -->
 <!-- strip numerics. Giving "//" or "/" -->
 <xsl:variable name="normalized2"   select="translate($normalized, '#', ''
)"/>
 <!-- cleared numerics, so expect "//". -->
 <xsl:variable name="date.check.1">
   <xsl:choose>
    <xsl:when test="$normalized2 = '//'">2</xsl:when>
    <xsl:when test="$normalized2 = '/'">1</xsl:when>
    <xsl:otherwise>0</xsl:otherwise>
   </xsl:choose>
 </xsl:variable>
<!-- <xsl:if test="$date.check.1 = 0">
   <xsl:message>
    <xsl:text>WARNING: unrecognized format (n/n/n or n/n): '</xsl:text>
    <xsl:value-of select="$normalized2"/>
    <xsl:text>' in '</xsl:text>
    <xsl:value-of select="$string"/>
    <xsl:text>'.</xsl:text>
   </xsl:message>
 </xsl:if>-->
 <!-- strip slashes. Giving "####" to "########".  -->
 <xsl:variable name="normalized3"   select="translate($normalized, '/', ''
)"/>
 <!-- check range of lengths. -->
 <xsl:variable name="date.ok.2">
   <xsl:choose>
    <xsl:when test="string-length($normalized3) >= 4 and
string-length($normalized3) &lt;= 8">true</xsl:when>
    <xsl:otherwise>false</xsl:otherwise>
   </xsl:choose>
 </xsl:variable>
 <xsl:variable name="before-slash-1" select="substring-before($string, '/')"
/>
 <xsl:variable name="after-slash-1" select="substring-after($string, '/')" />
 <xsl:variable name="before-slash-2" select="substring-before($after-slash-1,
'/')" />
 <xsl:variable name="after-slash-2" select="substring-after($after-slash-1,
'/')" />
  <!-- Work out which is which, ie dd/mm/yy(yy) or mm/yy(yy). -->
  <xsl:variable name="year-num" >
 <xsl:choose>
   <xsl:when test="($date.check.1 = 2) and $date.ok.2">
    <!-- dd/mm/yy(yy) -->
    <xsl:variable name="result">
      <xsl:value-of select="number( $after-slash-2 )" />
    </xsl:variable>
    <xsl:value-of select="$result" />
   </xsl:when>
   <xsl:when test="($date.check.1 = 1) and $date.ok.2">
    <!-- mm/yy(yy) -->
    <xsl:variable name="result">
      <xsl:value-of select="number( $after-slash-1 )" />
    </xsl:variable>
    <xsl:value-of select="$result" />
   </xsl:when>
 </xsl:choose>
  </xsl:variable>
  <xsl:variable name="month-num" >
   <xsl:choose>
     <xsl:when test="($date.check.1 = 2) and $date.ok.2">
      <!-- dd/mm/yy(yy) -->
      <xsl:variable name="result">
        <xsl:value-of select="number( $before-slash-2 )" />
      </xsl:variable>
      <xsl:value-of select="$result" />
     </xsl:when>
   <xsl:when test="($date.check.1 = 1) and $date.ok.2">
    <!-- mm/yy(yy) -->
    <xsl:variable name="result">
      <xsl:value-of select="number( $before-slash-1 )" />
    </xsl:variable>
    <xsl:value-of select="$result" />
   </xsl:when>
 </xsl:choose>
  </xsl:variable>
 <xsl:variable name="this-year" select="date:year()" />
 <xsl:variable name="this-year-in-century" select="$this-year mod 100" />
 <xsl:variable name="year-num-2">
   <xsl:choose>
    <xsl:when test="$year-num &lt; $this-year-in-century">
      <xsl:value-of select="$year-num + 2000" />
    </xsl:when>
    <xsl:when test="$year-num > $this-year-in-century and $year-num &lt; 100">
      <xsl:value-of select="$year-num + 1900" />
    </xsl:when>
    <xsl:otherwise>
      <xsl:value-of select="$year-num" />
    </xsl:otherwise>
   </xsl:choose>
 </xsl:variable>
 <xsl:variable name="day-num" >
  <xsl:choose>
    <xsl:when test="($date.check.1 = 2) and $date.ok.2">
     <!-- dd/mm/yy(yy) -->
     <xsl:variable name="result">
       <xsl:value-of select="number( $before-slash-1 )" />
     </xsl:variable>
     <xsl:value-of select="$result" />
      </xsl:when>
      <xsl:when test="($date.check.1 = 1) and $date.ok.2">
     <!-- mm/yy(yy) -->
     <xsl:variable name="result" select="0" />
     <xsl:value-of select="$result" />
    </xsl:when>
  </xsl:choose>
 </xsl:variable>
 <xsl:variable name="day-string" select="format-number( $day-num, '00' )" />
 <xsl:variable name="month-string" select="format-number( $month-num, '00' )"
/>
 <xsl:variable name="year-string" select="format-number( $year-num-2, '0000'
)" />
 <!-- Return something. We've already worked out which is year and which is
the num. -->
 <xsl:variable name="return" >
   <xsl:choose>
    <xsl:when test="($date.check.1 = 2) and $date.ok.2 and $day-num > 0">
      <!-- dd/mm/yy(yy) -->
      <xsl:variable name="result">
     <xsl:value-of select="$year-string" />-<xsl:value-of
select="$month-string" />-<xsl:value-of select="$day-string" />
      </xsl:variable>
      <xsl:value-of select="$result" />
    </xsl:when>
    <xsl:when test="($date.check.1 = 1) and $date.ok.2 and $day-num = 0">
      <!-- mm/yy(yy) -->
      <xsl:variable name="result">
     <xsl:value-of select="$year-string" />-<xsl:value-of
select="$month-string" />
      </xsl:variable>
      <xsl:value-of select="$result" />
    </xsl:when>
   </xsl:choose>
 </xsl:variable>
 <xsl:value-of select="$return" />
</xsl:template>





Richard Kerry
BNCS Engineer, SI SOL Telco & Media Vertical Practice
T: +44 (0)20 3618 2669
M: +44 (0)7812 325518
G300, Stadium House, Wood Lane, London, W12 7TA
richard.kerry@xxxxxxxx


This e-mail and the documents attached are confidential and intended solely
for the addressee; it may also be privileged. If you receive this e-mail in
error, please notify the sender immediately and destroy it. As its integrity
cannot be secured on the Internet, the Atos group liability cannot be
triggered for the message content. Although the sender endeavours to maintain
a computer virus-free network, the sender does not warrant that this
transmission is virus-free and will not be liable for any damages resulting
from any virus transmitted.

________________________________________
From: Martin Honnen martin.honnen@xxxxxx
[xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx]
Sent: 12 June 2014 19:14
To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
Subject: Re: [xsl] Parse a date - exslt:parse-date in Saxon 6

Kerry, Richard richard.kerry@xxxxxxxx wrote:

> I can see that a suitable parser function (parse-date) is defined in
> Exslt but it isn't clear whether it is already available to me or how to
> get it into use if not.  Actually according to the exslt documentation
> it is definitely not available in any XSLT processor but there
> are JavaScript and Msxsl implementations available.
>
> Can someone advise how I can get this to work ?
>
> Can I get Saxon 6 to call a JavaScript function ?

As far as I know there is no way with Saxon 6 to use Javascript to
implement extension functions.

Atos, Atos Consulting, Worldline and Canopy The Open Cloud Company are trading
names used by the Atos group. The following trading entities are registered in
England and Wales: Atos IT Services UK Limited (registered number 01245534),
Atos Consulting Limited (registered number 04312380), Atos Worldline UK
Limited (registered number 08514184) and Canopy The Open Cloud Company Limited
(registration number 08011902). The registered office for each is at 4 Triton
Square, Regentbs Place, London, NW1 3HG.The VAT No. for each is:
GB232327983.

This e-mail and the documents attached are confidential and intended solely
for the addressee, and may contain confidential or privileged information. If
you receive this e-mail in error, you are not authorised to copy, disclose,
use or retain it. Please notify the sender immediately and delete this email
from your systems. As emails may be intercepted, amended or lost, they are not
secure. Atos therefore can accept no liability for any errors or their
content. Although Atos endeavours to maintain a virus-free network, we do not
warrant that this transmission is virus-free and can accept no liability for
any damages resulting from any virus transmitted. The risks are deemed to be
accepted by everyone who communicates with Atos by email.


Current Thread
Keywords