[oXygen-user] an xslt challenge

James Cummings james at blushingbunny.net
Mon Nov 5 06:54:05 CST 2018


Hi Lou,

Yes, it does suggest for each paragraph you'd be tokenising (or grouping)
into sentences, which might have a slight efficiency hit (but I doubt that
much), but would make the choosing the number of sentences to be under
$maxWords easier. I was assuming that you wanted the output to have the
sentences marked as <s>, my mistake.

-James

On Mon, 5 Nov 2018 at 12:31, Lou Burnard <lou.burnard at retired.ox.ac.uk>
wrote:

> Thanks for v the quick reply james but doesnt your approach imply that the
> tokenisation into sentences has already been done? Im trying t o avoid a
> two pass solution as I expect to be doing this hundreds of times
>
> reluctantly using Outlook for Android <https://aka.ms/ghei36>
>
> ------------------------------
> *From:* James Cummings <james at blushingbunny.net>
> *Sent:* Monday, November 5, 2018 1:10:02 PM
> *To:* Lou Burnard
> *Cc:* oxygen-user at oxygenxml.com
> *Subject:* Re: [oXygen-user] an xslt challenge
>
> Hi Lou,
>
> Would it make sense to use xsl:for-each-group to group the sentences into
> <s> units to make this easier? Then I'd probably recursively call a
> template or function passing the current collection of <s> units as a
> variable item* value, testing if its tokenised number is above or below
> $maxWords.
>
> Not got time to write that out as a solution atm, and I'm sure it can be
> done without the recursivity as well, but that is the approach that would
> have occurred to me at least.
>
> -James
>
>
> On Mon, 5 Nov 2018 at 12:03, Lou Burnard <lou.burnard at retired.ox.ac.uk>
> wrote:
>
>> I hope I am not abusing this list in asking occasionally for advice on
>> the best way to hack something in xslt.
>>
>> Today's problem is to output only the first x sentences (string
>> terminated by a full stop) of a paragraph such that the total number of
>> words (space delimited strings)  is less than some limit (call it
>> $maxWords) Since the sentences are of variable length, obviously I don't
>> know what x is.
>>
>> Here's where I got to so far:
>>
>> <xsl:template match="t:p">
>>         <xsl:variable name="pString">
>>             <xsl:value-of select="."/>
>>         </xsl:variable>
>>         <xsl:for-each select="tokenize($pString, '\.\s')">
>>             <xsl:variable name="seq">
>>                 <xsl:value-of select="string(position())"/>
>>             </xsl:variable>
>>             <xsl:variable name="wordsSoFar">
>>                 <xsl:value-of
>> select="string-length(translate(normalize-space
>>                 (preceding-sibling::text()), ' ', '')) + 1"/>
>>             </xsl:variable>
>>           <xsl:if test="$wordsSoFar < $maxWords">
>>
>>             <s n="{$seq}">
>>                 <xsl:value-of select="."/>
>>             </s>
>>
>>           <xsl:if>
>>
>>        </xsl:for-each>
>>     </xsl:template>
>>
>> But this is not valid because preceding-sibling:: wants a node() not a
>> string (even though "text()" *is* a node imho).
>>
>> Am I going about this entirely the wrong way?
>>
>>
>>
>>
>> _______________________________________________
>> oXygen-user mailing list
>> oXygen-user at oxygenxml.com
>> https://www.oxygenxml.com/mailman/listinfo/oxygen-user
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.oxygenxml.com/pipermail/oxygen-user/attachments/20181105/37f6d6bc/attachment.html>


More information about the oXygen-user mailing list