[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

RE: [xsl] text() and not()


Subject: RE: [xsl] text() and not()
From: Joerg Pietschmann <joerg.pietschmann@xxxxxx>
Date: Tue, 08 Jan 2002 18:21:50 +0100

"Andrew Welch" <andrew@xxxxxxxxxxxxxxxxxxxxxxx> wrote:
> >//text()[not(parent::title)]
> This is close :) however it fails to match the trailing close bracket ')'
> for some reason.

You could probably profit from working throug a XML tutorial.
Some stuff for starting:
There is an underlying tree modell to XML, on which XML aware
software (like XSLT processors) operate. Assume you have a XML
document

<?xml version="1.0">
<!-- some XML comment -->
<mydoc>
  <section>
    <title>
Renew LP Piston Seal (Fig 5.5.1
      <xref xrefid="F5.5.1" xidtype="FIGURE">) </xref>
    </title>
    <para>some text</para>
  </section>
</mydoc>

The associated tree could be roughly visualized as

root node
   +- comment node "some XML comment"
   +- element node "mydoc"
       +- text node "&#10;&#32;&#32;"
       +- element node "section"
       |   +- text node "&#10;&#32;&#32;&#32;&#32;"
       |   +- element node "title"
       |   |   +- text node "&#10;Renew LP Piston Seal (Fig 5.5.1&#10;&#32;&#32;&#32;&#32;&#32;&#32;"
       |   |   +- element node "xref"
       |   |   |   +- text node "(&#32;&#32;"
       |   |   +- text node "&#10;&#32;&#32;&#32;&#32;"
       |   +- text node "&#10;&#32;&#32;&#32;&#32;"
       |   +- element node "para"
       |   |   +- text node "some text"
       |   +- text node "&#10;&#32;&#32;"
       +- text node "&#10;"

I used XML character references &#10; and &#32; to draw attention
to otherwise unremarkable linefeed and space characters (hope i got
it right...). This may be important later.
Nodes in this tree can be adressed by XPath expressions. The root
node is denoted by a single slash "/". You can walk down to element
nodes by using the element names: the name "mydoc" is adressed by
"/mydoc", the "para" element by "/mydoc/para" etc. These child nodes
are said to reside in an "axis" relative to the node where you are,
the child axis. This axis is the default when using XPath for
navigating the tree, you can make it expicit by writing for example
"/child::mydoc/child::para" which is the same as "/mydoc/para".
There are other axes which allow you to walk up, the parent axis
which holds the parent node as the only node, and the ancestor axis
which holds all ancestors up to the root node, starting with the
parent. There is more, you can look it up in a good book, a XPath
online tutorial or reference for example on http://www.zvon.org
or in the spec.
You see that only element nodes have names. Furthermore since text
nodes have no names you'll have to use the special notation text()
in order to adress text nodes. A similar notation us used to retrieve
XML comments: comment(), and you can use node() to adress any node,
regardless whether element node, text node or something else. If
you want to retrieve element nodes regardless of the element name you
can use an asterisk "*", this means you can write "/*/para" and still
get the para element node.
So far i used "absolute" access paths, starting at the root node.
In XSLT there is always a context node which you are currently
processing. This allows you to use "relative" paths, which simply
don't start with a slash. If you are processing the section node
in the example above, you can get to the title node simply by
writing "title".
Until now, XPath expressions have been used to adress nodes in the
tree, or "selecting", which can be used this way in the "select"
attribute of the xsl:value-of and xsl:apply-templates elements.
It should be noted that the result of such a selection can be more
than one node, if there were more para elements in the section element,
"/mydoc/section/para" would get all of them. Actually, if there were
also more than one section element in the mydoc element, the expression
would adress all para elements in all these section elements. On the
contrary, while processing a certain section element, the relative
path "para" will only collect the para elements which are children of
the current section element and not any of the para elements which
reside elswhere in the tree.
Another use of XPath expressions is in the "match" attribute of the
xsl:template elements in the style sheet (actually, you can only use a
certain subset here). By default, the XSLT processor walks the tree
for you and tries to deduct what templates to apply at a certain node,
which is controlled by the "match" attribute. In the simplest case the
match attribute is set to an element name, the template will then be
used every time the processor encounters an element with this name.
A match="*" will be applied to any element node, but not text nodes
or other non-element nodes. A match="text()" will be applied to any
text node. You can also have more complex patterns, a match="para/text()"
will be applied to text nodes which are immediate children of a para
element node. The processor tries also to be smart when selecting
templates to apply, you can have both a match="para/text()" and
match="text()", the first template is then applied to text in paras
only, the second to all other text nodes (the rules are actually
somewhat complicated, be alerted if you get warnings about ambiguous
matches).
In order to account for more sophisticated use cases, you can use
predicates in path expressions. These are the expressions in brackets
"[..]" after element names in the path expression, like
"para[normalize-space()]". Predicates restrict the nodes which are
selected or matched. You may apply predicates also to the special *,
text() and node() selectors. The expression inside the brackets must
ultimately result in a true-or-false answer. For convenience, node
sets resulting from a path expression are considered false if empty
and true if non-empty. You can and usually should use relative path
expressions in predicate expressions, which are evaluated relative to
the nodes in the set selected in the step which the predicate is attached
to. For example, take "/mydoc[section]". The first step is "/mydoc"
which selects the only mydoc element. The predicate will prune this
set for members for which the predicate is false. The expression is
a relative path expression, which will try to select section elements
which are child nodes of the mydoc element. Since the only node in
the set has a section element as child, the expression result is coerced
to "true" and the node survives the pruning process. The expression
"/mydoc[para]" would result in an empty node set. The expression
"/mydoc[parent::*]" will also result in an empty node set. The predicate
tries to select a parent node of mydoc which is an element, but
the parent of mydoc is the root node which is not an element node.
Predicates can be attached to every step in a path, like for example
"/mydoc[section]/section[para]". Path expressions in predicates
may have its own predicates, like "/mydoc[section[para]]", so you
may construct rather complex expressions.
Predicates may also be used in match expressions, as you already know,
there are however some restrictions on the expressions therein.

If you want to have a set of templates which outputs all text of
title elements even if it is contained in subelements and discards
everything else there are multiple solutions. Actually, the problem
is somewhat underspecified.
You can for example override the default template for text nodes
which don't have an title element node among its ancestors:

  <xsl:template match="text()[not(ancestor::title)]"/>

This will output the ")  " in your xref element but also text in other
elements nested in title elements and of any elements nested therein
and so on, which may be what you want or not. It will also discard any
text discovered anywhere else unless it is output by other means, which
will most probably haunt you later even if it achieves the effect you
want now.
It may be possible to avoid this by reestablishing the default behaviour
for text nodes which have a title element as ancestor and disable
everything else:
  <xsl:template match="text()[ancestor::title]">
   <xsl:value-of select="."/>
  </xsl:template>
  <xsl:template match="text()"/>

Both solutions may have performance problems, especially if you have
a somewhat deeper nested tree, because for every text node the ancestor
axis has to be searched for a title element. It may be more prudent,
though perhaps more bothersome, to define special processing for all
the elements for which no text should be generated.
As a start, use something like
  <xsl:template match="*[ancestor-or-self::title]">
    <xsl:apply-templates/>
  </xsl:template>
  <xsl:template match="*">
    <xsl:apply-templates select="*"/>
  </xsl:template>
The second template walks down the tree but only on element nodes,
text is not processed. The first template reestablishes default behaviour
for title elements and descendants ot title elements.

HTH
J.Pietschmann

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list



Current Thread
Keywords