[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

RE: [xsl] generating ID strings that are both readable and unique


Subject: RE: [xsl] generating ID strings that are both readable and unique
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Tue, 14 Oct 2008 10:44:03 +0100

Quite hard to do in "pure" XSLT 1.0 without a node-set() extension, because
I think any solution that is moderately efficient is going to involve some
temporary data.

I would create a temporary document containing all distinct ids/titles like
this

<xsl:variable name="allids">
  <xsl:for-each-group select="//section" group-by="(@original-id,
@title)[1]">
      <id id="{current-grouping-key()}" count="count(current-group())"/>
  </xsl:for-each-group>
</xsl:variable>

Here's a function to get a unique ID derived from a string s and a sequence
number, that is guaranteed unique:

<xsl:function name="f:unique" as="xs:string">
  <xsl:param name="input" as="xs:string"/>
  <xsl:param name="gid" as="xs:string"/>
  <xsl:choose>
    <xsl:when test="exists($allids/id[@id=$input]">
      <xsl:sequence select="f:unique(concat($input, '_', $gid))"/>
    </xsl:when>
    <xsl:otherwise>
      <xsl:sequence select="$input"/>
    </xsl:otherwise>
  </xsl:choose>
</xsl:function>

And then when processing an individual section, 

<xsl:attribute name="id">
  <xsl:choose>
    <xsl:when test="@id">
      <xsl:value-of select="@id"/>
    </xsl:when>
    <xsl:when test="$allids/id[@id=current()/@title]/@count eq 1">
      <xsl:value-of select="@title"/>
    </xsl:when>
    <xsl:otherwise>
      <xsl:value-of select="f:unique(@title, generate-id())"/>
    </xsl:otherwise>
  </xsl:choose>
</xsl:attribute>

Instead of using generate-id() for disambiguation, you could use the result
of xsl:number level="any". This would mean that if there are two sections
titled "Introduction", one gets the id "Introduction_1", the other
"Introduction_2". In the rare event that "Introduction_1" is already in use,
you would get "Introduction_1_1" etc.

Michael Kay
http://www.saxonica.com/

> -----Original Message-----
> From: Trevor Nicholls [mailto:trevor@xxxxxxxxxxxxxxxxxx] 
> Sent: 14 October 2008 07:26
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject: [xsl] generating ID strings that are both readable and unique
> 
> Hi
> 
> In this particular application we have a set of XML documents 
> which are divided into nested sections; each section may 
> (down the track) give rise to a url. Currently that url is 
> generated by <xsl:number level="multiple"> but this produces 
> urls that change frequently. Some sections have been given an 
> ID attribute by the process which originally created the 
> documents, but most have not.
> Additionally, all sections must have exactly one title child, 
> along with their other content.
> 
> The requirement is to process an XML file and generate an ID 
> attribute for sections which lack them - deriving the ID 
> value from the title so that the url is comprehensible. 
> Providing we ignore the problem cases, this is a trivial exercise:
> 
> ----
> 
>  <xsl:variable name="upchars" 
> select="'ABCDEFGHIJKLMNOPQRSTUVWXYZ'" />  <xsl:variable 
> name="lochars" select="'abcdefghijklmnopqrstuvwxyz'" />
> 
>  <!-- catchall -->
>  <xsl:template match="*">
>    <xsl:copy>
>      <xsl:apply-templates select="@*" />
>      <xsl:apply-templates />
>    </xsl:copy>
>  </xsl:template>
> 
>  <xsl:template match="@*">
>    <xsl:copy-of select="." />
>  </xsl:template>
> 
>  <xsl:template match="section[@id]">
>    <xsl:copy>
>      <xsl:apply-templates select="@*" />
>      <xsl:apply-templates />
>    </xsl:copy>
>  </xsl:template>
> 
>  <xsl:template match="section">
>    <xsl:copy>
>      <xsl:attribute name="id">
>        <xsl:apply-templates select="title" mode="id" />
>      </xsl:attribute>
>      <xsl:apply-templates select="@*" />
>      <xsl:apply-templates />
>    </xsl:copy>
>  </xsl:template>
> 
>  <xsl:template match="title" mode="id">
>    <xsl:value-of select="translate(translate(.,' 
> ','_'),$upchars,$lochars)"
> />
>  </xsl:template>
> 
> ----
> 
> The problem cases are
> (a) duplicate titles (after the translations) which would 
> lead to duplicate IDs, and
> (b) existing IDs which might also duplicate a title.
> 
> If there were no IDs in the document to begin with, I think I 
> could have solved the first problem by using a key. But the 
> second problem complicates it, and I haven't got enough 
> experience with keys to figure out how to adjust the "id" 
> mode title template to take both issues into account.
> 
> Can anyone offer some helpful advice here?
> XSL 1.0 is preferred, although I would be interested to see 
> how XSL2 might handle this problem too. 
> 
> Thanks
> Trevor


Current Thread
Keywords