[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] xsl:number question (XSLT 1.0)


Subject: Re: [xsl] xsl:number question (XSLT 1.0)
From: JBryant@xxxxxxxxx
Date: Fri, 15 Apr 2005 10:22:07 -0500

The relevant chunk of the spec (from http://www.w3.org/TR/xslt#number):

-------------------

The following attributes are used to control conversion of a list of 
numbers into a string. The numbers are integers greater than 0. The 
attributes are all optional.
The main attribute is format. The default value for the format attribute 
is 1. The format attribute is split into a sequence of tokens where each 
token is a maximal sequence of alphanumeric characters or a maximal 
sequence of non-alphanumeric characters. Alphanumeric means any character 
that has a Unicode category of Nd, Nl, No, Lu, Ll, Lt, Lm or Lo. The 
alphanumeric tokens (format tokens) specify the format to be used for each 
number in the list. If the first token is a non-alphanumeric token, then 
the constructed string will start with that token; if the last token is 
non-alphanumeric token, then the constructed string will end with that 
token. Non-alphanumeric tokens that occur between two format tokens are 
separator tokens that are used to join numbers in the list. The nth format 
token will be used to format the nth number in the list. If there are more 
numbers than format tokens, then the last format token will be used to 
format remaining numbers. If there are no format tokens, then a format 
token of 1 is used to format all numbers. The format token specifies the 
string to be used to represent the number 1. Each number after the first 
will be separated from the preceding number by the separator token 
preceding the format token used to format that number, or, if there are no 
separator tokens, then by . (a period character).

--------------------------------

Reading this, I'd say Xalan has it right. "If the first token is a 
non-alphanumeric token, then the constructed string will start with that 
token; if the last token is non-alphanumeric token, then the constructed 
string will end with that token." makes it pretty clear that your example 
should start with "(" and end with ")". "Each number after the first will 
be separated from the preceding number by the separator token preceding 
the format token used to format that number, or, if there are no separator 
tokens, then by . (a period character)." makes it pretty clear that your 
numbers should be separated by periods, since you specified no separator.

I can see where Mike Kay got his implementation, though: "separated from 
the preceding number by the separator token preceding the format token 
used to format that number". However, the "after the first" part makes me 
think that the opening "(" should not apply to numbers after the first. 
That, combined with the last sentence from that paragraph in the spec, 
makes me think that (1.2.1.1) is the right output.

As I read this, to get the output that Saxon produced, you'd have to 
specify "(1(1)", and the fully specified string for what Xalan produced 
would be "(1.1)".

I think a clarifying sentence for when only number is present in the 
format string but multiple numbers are to be formatted would help. Perhaps 
something like "When the format string contains only one numeric position 
but the output will be multiple numeric values, the separator should be . 
(a period character)."

By the way, I do not wish to imply that I think ill of the spec or its 
authors because of this problem. It's very hard to write something 
sufficiently generic and still anticipate every case. It's easy to say 
that a clarifying sentence would help after the problem has arisen. It's a 
much harder writing task to anticipate the problem in the first place and 
write the spec to cover it. It's a wonder these kinds of things don't pop 
up more often.

My $.02.

Jay Bryant
Bryant Communication Services
(presently consulting at Synergistic Solution Technologies)

Jack Matheson <jack@xxxxxxxxxxxxxx> 
04/15/2005 09:31 AM
Please respond to
xsl-list@xxxxxxxxxxxxxxxxxxxxxx
 
To
xsl-list@xxxxxxxxxxxxxxxxxxxxxx
cc

Subject
[xsl] xsl:number question (XSLT 1.0)
 

 


According to the spec, when a sequence number contains more values than 
there are formatting tokens, the last formatting token is used for the 
excess values. Unfortunately, it is a little vague on which separator 
token to use with the excess values.

It says that a '.' is to be used if no separator token exists, but does 
this also apply to the case where the final formatting token is re-used 
with excess sequence values?

Here is a quick test I did to try and see how different processors are 
handling this:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
version="1.0">
   <xsl:template match="a/b/c/d">
      <xsl:number level="multiple" count="*" format="(1)"/>
   </xsl:template>
</xsl:stylesheet>

If my input document is...

<?xml version="1.0"?>
<a><b><c><d/></c></b></a>


...then Saxon produces this:
(1(2(1(1)

...while Xalan produces this:
(1.2.1.1)

Both answers seem perfectly reasonable to me, given the lack of clarity 
in the 1.0 spec.
Can anyone help me figure out which is (more) correct?


Current Thread
Keywords