Page 1 of 1

Display a reading duration

Posted: Mon Jun 27, 2022 9:08 pm
by gbv34
Hello!
I'm looking a way to display the reading duration for a topic.
Currently, my best lead is an XSLT shared in this forum that displays the number of words contained in a map.

Code: Select all

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

    <xsl:output indent="yes"/>

    <xsl:template match="/">

        <counts>

            <xsl:apply-templates/>

        </counts>

    </xsl:template>

    <xsl:template match="text()"/>

    <xsl:template match="*[contains(@class, 'map/map')]">

        <xsl:variable name="text">

            <xsl:apply-templates mode="getText" select="node()"/>

        </xsl:variable>

        <count>

            <xsl:value-of
                select="count(tokenize(lower-case($text), '(\s|[,.!:;]|[n][b][s][p][;])+')[string(.)])"/>

        </count>

        <xsl:apply-templates/>

    </xsl:template>

    <xsl:template match="*[contains(@class, 'map/map')]" mode="getText"/>

</xsl:stylesheet>
However, what I don't get is how I could insert the results in the first topic of the map, as a kind of infoblock.
Also, having the number of words is a good basis, but it should be necessary to proceed some calculation. I found this nice little js function which provides a rough estimation based on a word per minute.

Code: Select all

function readingTime() {
  const text = document.getElementById("article").innerText;
  const wpm = 225;
  const words = text.trim().split(/\s+/).length;
  const time = Math.ceil(words / wpm);
  document.getElementById("time").innerText = time;
}
readingTime();
Is there any way to apply such a function with xslt?
Has someone in the forum already met this need?

Any feedback is welcome. I recognize I was surprise nit seeing any plugin developed for that.

Re: Display a reading duration

Posted: Tue Jun 28, 2022 1:26 pm
by chrispitude
Hi gbv34,

Do you want this for PDF or HTML5 output, or both? This might influence how the implementation would work.

Re: Display a reading duration

Posted: Tue Jun 28, 2022 4:08 pm
by gbv34
Hi Chris,
I intend to achieve this feature for HTML files.

Re: Display a reading duration

Posted: Tue Jun 28, 2022 6:10 pm
by chrispitude
Hi gbv34,

How would you want it annotated to the HTML output? Is there a specific HTML structure you would want?

Maybe just as a proof of concept, I could put the information in an HTML <abstract> element?

- Chris

Re: Display a reading duration

Posted: Tue Jun 28, 2022 7:18 pm
by gbv34
Hi again :)
Actually, I don't have a specific requirement for that but I thought it would be more relevant to place it close to the main heading in a topic, right after the opening of the <body> element.

Re: Display a reading duration

Posted: Mon Aug 29, 2022 3:12 pm
by chrispitude
Hi gbv34,

For HTML-based transformations (html5 and webhelp-responsive), I think I could add a reading duration to each individual topic's page. But, I don't know how to compute the duration for the entire map. There is no place in these transformations where all of the map content is easily available from a single XSLT template. It could perhaps be achieved by accessing all the topic files as external documents from a map template, but that is not something I readily know how to do.

For the PDF Chemistry transformation (pdf-css-html5), I might be able to make it work for the entire map.

Let me know what sounds interesting to you!

Re: Display a reading duration

Posted: Sun Sep 18, 2022 9:20 pm
by gbv34
Hi Chris,
In my case, I am actually looking to display the reading time at the topic level. If you already have a snippet available, that would be awesome ;)

Re: Display a reading duration

Posted: Mon Sep 19, 2022 9:40 am
by Radu
Hi,
I added an extra <div> to the stylesheet I used to customize the prolog section for the Oxygen XML Blog:
https://github.com/oxygenxml/blog/blob/ ... prolog.xsl
It looks like this:

Code: Select all

        <div style="color: gray;">
            <xsl:variable name="fileContent" select="/"/>
            <xsl:variable name="text" select="normalize-space($fileContent)"/> 
            <xsl:variable name="textWithoutSpaces" select="translate($fileContent, ' ', '')" /> 
            <xsl:variable name="fileCountWords" select="string-length($text) - string-length($textWithoutSpaces) +1"/>
            <xsl:variable name="readMin" select="format-number($fileCountWords div 50, '0')"/>
            Read time: <xsl:value-of select="$readMin"/> minute(s)
        </div>
As you are matching the topic/body element you can define the file content variable like:

Code: Select all

 <xsl:variable name="fileContent" select="."/>
to obtain the entire content for the current body.
In a way I'm skeptical showing the read time for technical documentation content is that useful, in the example above I divided the number of words with "50", on the net you will find it's advised to divide it with 225. But technical documentation is not a romance novel, people need to think about what they are reading and apply it to their situation.

Regards,
Radu

Re: Display a reading duration

Posted: Tue Sep 20, 2022 3:55 am
by chrispitude
Hi Radu,

I took a different approach of computing the word count using a moded template, so that I could force certain elements to be skipped. And although I don't do this, other element-specific heuristics could be applied, such applying different treatment to code blocks (such as counting lines, or dividing the "words" by some ratio).

Here is my approach, adapted into the blog XSLT file:

Code: Select all

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="xs" version="2.0">

    <xsl:template match="*[contains(@class, ' topic/prolog ')]">
        <!-- Display the author name -->
        <xsl:variable name="avatar-author" select="replace(*[contains(@class, ' topic/author ')],' ','_')"/>
        <div class="author inPage {$avatar-author}">
            <a href="/topics/contributors.html">
                <xsl:value-of select="*[contains(@class, ' topic/author ')]"/>
            </a>
        </div>
        <!-- Display the creation date -->
        <xsl:if test="exists(.//*[contains(@class, ' topic/created ')]/@date)">
            <div class="date inPage">
                <xsl:variable name="cd" select=".//*[contains(@class, ' topic/created ')]/@date"/>
                <xsl:value-of select="format-date(xs:date($cd),
                    '[D] [MNn,3-3] [Y0001]')"/>
            </div>
        </xsl:if>
        <!-- Display the number of minutes it takes to read the article -->
        <div style="color: gray;">
            <xsl:variable name="word-count" as="xs:integer">
                <xsl:apply-templates select=".." mode="word-count"/>  <!-- apply to enclosing topic -->
            </xsl:variable>
            <xsl:variable name="minutes" select="xs:integer(ceiling($word-count div $words-per-minute))" as="xs:integer"/>
            Read time: <xsl:value-of select="' %d %s'
                => replace('%d', string($minutes))
                => replace('%s', if ($minutes = 1) then 'minute' else 'minutes')"/>
        </div>
        <xsl:next-match/>
    </xsl:template>

    <!-- define a moded template used to compute the word count -->
    <xsl:variable name="words-per-minute" as="xs:integer" select="50"/>

    <!-- for text() nodes, simply count the words -->
    <xsl:template match="text()" mode="word-count" as="xs:integer">
        <xsl:value-of select="count(tokenize(normalize-space(.), '\s+'))"/>
    </xsl:template>

    <!-- for elements, sum the child-node word counts -->
    <xsl:template match="*" mode="word-count">
        <xsl:variable name="counts" as="xs:integer*">
            <xsl:apply-templates select="node()" mode="#current"/>
        </xsl:variable>
        <xsl:value-of select="sum($counts)"/>
    </xsl:template>

    <!-- do not let these elements contribute to word count -->
    <xsl:template match="@*" mode="word-count"/>
    <xsl:template match="*[contains-token(@class, 'topic/titlealts')]" mode="word-count"/>
    <xsl:template match="*[contains-token(@class, 'topic/prolog')]" mode="word-count"/>
    <xsl:template match="*[contains-token(@class, 'topic/indexterm')]" mode="word-count"/>
    <xsl:template match="*[contains-token(@class, 'topic/draft-comment')]" mode="word-count"/>

</xsl:stylesheet>

Re: Display a reading duration

Posted: Tue Sep 20, 2022 6:41 am
by Radu
Hi Chris,

Thanks for contributing to this.

Regards,
Radu

Re: Display a reading duration

Posted: Tue Sep 20, 2022 12:49 pm
by chrispitude
Hi Radu,

Should I submit a blog PR with this approach? Or would you prefer to keep the current approach?

Re: Display a reading duration

Posted: Tue Sep 20, 2022 12:56 pm
by Radu
Hi Chris,
The XSLT in the current blog post is simpler, indeed it does not skip over certain elements but usually a prolog does not have much text content so should not influence the read time much, codeblocks which contain words should probably be contributed as well. Indeed draft comments make sense to be skipped although they might not be present anymore in the temporary processed DITA topics if the args.draft parameter is disabled. And if it not disabled, then the draft comment makes it into the HTML file and needs to be read as well.

Regards,
Radu