[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] How to properly use Key elements


Subject: Re: [xsl] How to properly use Key elements
From: "G. T. Stresen-Reuter" <tedmasterweb@xxxxxxxxx>
Date: Wed, 16 Oct 2013 00:35:06 +0100

On Oct 14, 2013, at 2:58 PM, Wendell Piez <wapiez@xxxxxxxxxxxxxxx> wrote:

> Hi Ted,
>
> First, a caution. Before you find yourself thinking about Muenchian
> grouping in 2013 you should ask whether you really have no option but
> to use XSLT 1.0. XSLT 2.0 is far superior in many respects, including
> the availability of xsl:for-each-group. I acknowledge that there may
> be other reasons to explore Muenchian grouping besides having no
> choice, so I don't want to discourage your question. But why try to
> bake bread on an open fire when stoves are readily available, etc.

Indeed, were 2.0 an option, I'd definitely be using it! Unfortunately I'm
stuck with a 1.0 implementation, but I could be mistaken. We're doing the
transformations through PHP (5.3). I've just always assumed  libxslt (or
whatever it's called) was a 1.0-only implementation. Please: prove me wrong
and save me this extra work!

> So:
>
> <xsl:key
>        name="ports-by-ship"
>        match="td[position() = (count(.) - 1)]"
>        use="tr[count(td) = 4]/td[position() = 1]"
> />
>
> is legal, but wrong. It won't work because
>
> a. count(.) will always return '1' so your key will never match,
> because position() will never be 0.

That should have been obvious. Thanks for pointing that out!

> b. use='tr[etc]' will use the values of (certain) 'tr' children of
> your matched 'td' elements as key values, but 'td' never has 'tr'
> children, so even if the key matched, you'd have empty string key
> values.

So, to clarify, the USE attribute must be a child attribute or element of the
MATCH attribute. Is this correct?

> To devise a correct solution, I suggest
>
> 1. Considering whether you can't use XSLT 2.0 for-each-group.

Can't. :-(

> 2. If not, consider whether doing this in two passes would simplify
> the problem. (In the first pass you would label the td elements with
> their information types, simplifying the declaration of the key for
> the second pass.)

Ideal but in this particular case I'm doing this for a client and such an
approach *might* imply a change to their base processing system. I do think,
though, that I could probably create a variable (via exslt extensions)
consisting of a fragment marked up as suggested and then operate on the
fragment.

> 3. If neither of these, please clarify the logic whereby you know
> which td is of which type.

It may not have been clear in my sample markup so let me put it this way: we
are processing HTML tables of data. Each table contains "sections". The start
of each section is indicated by the presence of 4 TD elements in the first
row. Other rows only have 2 or 3 TD elements. The first TD element in the
first row has a ROWSPAN attribute running the length of the rows for that
section. This TD element has a value that represents the group name (is what
we'd like to group by).

Given your clarification about how keys work, it sounds like I need something
like this:

<xsl:key
       name="ports-by-ship"
       match="tr"
       use="tr[count(td) = 4]/td[position() = 1]"
/>

but I have to think that this would only give me the first row of TD elements,
and not all of those that follow. I suspect that I might need something like
"following-sibling::td" in the MATCH attribute or maybe

<xsl:key
       name="ports-by-ship"
       match="td"
       use="td[ancestor::tr[count(td) = 4]][1]"
/>

and even then I suspect I'll get ALL the ancestors instead of just the most
recent

> At a higher level, I think the essence of the problem here is that you
> aren't accounting for evaluation context properly in devising your
> XPaths. Some review of the design and functionality of keys (apart
> from how to do Muenchian grouping) would be effort well spent. Keys
> are extremely useful in XSLT 2.0 as well! though no longer so
> necessary for grouping.

I've read and re-read everything I could find about keys but it's just one of
those things that for me, takes a while to sink in, especially if I haven't
seen it in a while. I fully understand their power and have used them
successfully in the past, but I'm just a bit lost on this one.

> A sample of required output for your input might be helpful in
> presenting your problem to us.

Correct. I had forgotten to provide the desired output. Let me try this
again.

Input:
<table>
  <thead>
    <tr>
      <th>Ship</th>
      <th>Route</th>
      <th>Port</th>
      <th>Date</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td rowspan="3">Titanic</td>
      <td rowspan="2">Pacific South</td>
      <td>San Francisco</td>
      <td>dd/mm/yyyy</td>
    </tr>
    <tr>
      <td>San Diego</td>
      <td>dd/mm/yyyy</td>
    </tr>
    <tr>
      <td>Acapulco</td>
      <td>dd/mm/yyyy</td>
    </tr>
    <tr>
      <td rowspan="2">Pacific Central</td>
      <td>Acapulco</td>
      <td>dd/mm/yyyy</td>
    </tr>
    <tr>
      <td>Punteras Cantsn</td>
      <td>dd/mm/yyyy</td>
    </tr>
    <tr>
      <td>Panama</td>
      <td>dd/mm/yyyy</td>
    </tr>
  </tbody>
</table>

Output:
<Titanic>
    <port name="San Francisco">dd/mm/yyyy</port>
    <port name="San Diego">dd/mm/yyyy</port>
    <port name="Acapulco">dd/mm/yyyy</port>
    <port name="Acapulco">dd/mm/yyyy</port>
    <port name="Punteras Cantsn">dd/mm/yyyy</port>
    <port name="Panama">dd/mm/yyyy</port>
</Titanic>

The actual input is a bit more complex than what I've shown here, but I think
I've presented the problem faithfully. It is important to note, though, that I
can't key off of the ROWSPAN attribute as there are other TD elements with
this attribute set that are NOT part of the section header and the value of
the ROWSPAN attribute can vary significantly (including being only 1 row). The
only thing that makes the section header unique is the number of TD elements
in the row.

And to be clear, I've managed to get the output I want, but not by using keys.
Fortunately my input is tiny so processing times really aren't an issue, but
as we all know, that could always change (and better just do things right to
begin with).

Thanks so much for your considered response.

Sincerely,

Ted Stresen-Reuter


Current Thread
Keywords