[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

RE: [xsl] Tricky XSLT 2.0 grouping problem


Subject: RE: [xsl] Tricky XSLT 2.0 grouping problem
From: "James Sulak" <jsulak@xxxxxxxxxxxxxxxx>
Date: Wed, 15 Oct 2008 11:27:44 -0500

Thanks, Ken and Michael.

Michael:  Our input data structure is even worse than I showed.  It's a
conversion project, and we initially tried to nest the sections based on
paragraph styles applied in FrameMaker.  Given that the paragraph styles
were only correct about 80% of the time, I decided to ignore them and
try the regex approach instead.

Ken:  Thanks for the code.  It works great with the test data I
supplied, and your string-to-codepoints() technique is an interesting
approach I hadn't seen.  Unfortunately, a few of the assumptions you
made (which were justified given the data) don't hold, and complicate
the problem further.  For example, roman numerals are not always the
deepest section.  Worse, a sequence can sometimes go (a) (a-1) (a-2) (b)
(c) within a single level, so you can't use the string-to-codepoints()
to figure out whether the sequence is correct.

I've ended up using a variation of my original approach.  Since I have
the original section level information available, I preserve that in an
attribute.  Instead of grouping directly with a regex, I process each
<body> element in two passes.  The first determines the level of each
element, using the same regex logic and some hard-coded exceptions (such
as the (h) and (i) problem), and in a completely ambiguous case, falling
back to the original level and inserting a flag element so it can be
manually inspected later.  The second pass actually groups the sections
based on the results of the first pass.

It's not perfect - as Michael pointed out, it's an impossible problem -
but it passes most of my tests.  No conversion without manual
intervention, I guess.

-James


-----Original Message-----
From: G. Ken Holman [mailto:gkholman@xxxxxxxxxxxxxxxxxxxx]
Sent: Friday, October 10, 2008 10:28 AM
To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
Subject: Re: [xsl] Tricky XSLT 2.0 grouping problem

At 2008-10-10 09:03 -0500, James Sulak wrote:
>I have a tricky grouping problem that I'm running into a wall with.  I
>thought it might be a fun challenge to throw out there.  I'm attempting
>to group a flat list of <section> elements into a hierarchy based on
>matching its number against different regular expressions.

Then you have to do one level at a time because of the
ambiguity.  You have to group each subgroup rather than trying to
find all subgroups.

>The list is assumed to be in the correct order.  I have it working
>(the code is
>below) with one exception:  roman numerals.

You have to work recursively, determining only those that are in
sequence.  I used a predicate and codepoints assuming that the three
levels that are grouped have members that are monotonically increasing.

How I try to help my students understand <xsl:for-each-group> is to
read the name of the instruction as if it were "for the first member
of each group", reminding them that in the loop the current node is
at the first member of the group that was created.  This helps them
(and me!) remember what I can do wherever I am in my grouping code
and helped me below.

I hope this helps you as well.

. . . . . . . . . Ken

t:\ftemp>type sulak.xml
<body>
<section><pnum>(a)</pnum><p>First-level section</p></section>
<section><pnum>(1)</pnum><p>Second-level section</p></section>
<section><pnum>(A)</pnum><p>Third-level section</p></section>
<section><pnum>(i)</pnum><p>Fourth-level section</p></section>
<section><pnum>(ii)</pnum><p>Fourth-level section</p></section>
<section><pnum>(B)</pnum><p>Third-level section</p></section>
<section><pnum>(2)</pnum><p>Second-level section</p></section>
<section><pnum>(A)</pnum><p>Third-level section</p></section>
</body>

t:\ftemp>call xslt2 sulak.xml sulak.xsl sulak.out
<?xml version="1.0" encoding="UTF-8"?>
<section>
    <pnum>(a)</pnum>
    <p>First-level section</p>
    <section>
       <pnum>(1)</pnum>
       <p>Second-level section</p>
       <section>
          <pnum>(A)</pnum>
          <p>Third-level section</p>
          <section>
             <pnum>(i)</pnum>
             <p>Fourth-level section</p>
          </section>
          <section>
             <pnum>(ii)</pnum>
             <p>Fourth-level section</p>
          </section>
       </section>
       <section>
          <pnum>(B)</pnum>
          <p>Third-level section</p>
       </section>
    </section>
    <section>
       <pnum>(2)</pnum>
       <p>Second-level section</p>
       <section>
          <pnum>(A)</pnum>
          <p>Third-level section</p>
       </section>
    </section>
</section><?xml version="1.0" encoding="US-ASCII"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                 version="2.0">

<xsl:output indent="yes"/>

<xsl:variable name="levels"
select="('\([a-z]\)','\([1-9]\)','\([A-Z]\)')"/>

<xsl:template match="body">
   <xsl:call-template name="do-next-group">
     <xsl:with-param name="level" select="1"/>
     <xsl:with-param name="population" select="section"/>
   </xsl:call-template>
</xsl:template>

<xsl:template name="do-next-group">
   <xsl:param name="level"/>
   <xsl:param name="population"/>

   <!--find all the first elements in the population at the current
level-->
   <!--the first is each one that is in sequence with the one
immediately
       before that matches the same level-->
   <xsl:for-each-group select="$population"
      group-starting-with="*[pnum[matches(.,$levels[$level])]]
             [not(
preceding-sibling::*[pnum[matches(.,$levels[$level])]] )
              or   string-to-codepoints(substring(pnum,2,1)) =
                   string-to-codepoints(substring(preceding-sibling::*

[pnum[matches(.,$levels[$level])]][1],2,1))
                   +1]">
     <!--determine what to do based on how deep-->
     <xsl:choose>
       <xsl:when test="$level = count($levels)">
        <!--at the last level, so everything after first is in the
next level-->
        <section>
          <xsl:copy-of select="pnum,p"/>
          <xsl:copy-of select="current-group()[position()>1]"/>
        </section>
       </xsl:when>
       <xsl:otherwise>
         <!--at other level, so need more grouping-->
         <section>
           <xsl:copy-of select="pnum,p"/>
           <xsl:call-template name="do-next-group">
             <xsl:with-param name="level" select="$level+1"/>
             <xsl:with-param name="population"
                             select="current-group()[position()>1]"/>
           </xsl:call-template>
         </section>
       </xsl:otherwise>
     </xsl:choose>
   </xsl:for-each-group>
</xsl:template>

</xsl:stylesheet>


--
Upcoming XSLT/XSL-FO hands-on courses:      Wellington, NZ 2009-01
Training tools: Comprehensive interactive XSLT/XPath 1.0/2.0 video
Video sample lesson:    http://www.youtube.com/watch?v=PrNjJCh7Ppg
Video course overview:  http://www.youtube.com/watch?v=VTiodiij6gE
G. Ken Holman                 mailto:gkholman@xxxxxxxxxxxxxxxxxxxx
Crane Softwrights Ltd.          http://www.CraneSoftwrights.com/s/
Male Cancer Awareness Nov'07  http://www.CraneSoftwrights.com/s/bc
Legal business disclaimers:  http://www.CraneSoftwrights.com/legal


Current Thread
Keywords