Display a reading duration

Post here questions and problems related to editing and publishing DITA content.
gbv34
Posts: 105
Joined: Thu Jan 20, 2022 12:36 pm

Display a reading duration

Post by gbv34 »

Hello!
I'm looking a way to display the reading duration for a topic.
Currently, my best lead is an XSLT shared in this forum that displays the number of words contained in a map.

Code: Select all

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

    <xsl:output indent="yes"/>

    <xsl:template match="/">

        <counts>

            <xsl:apply-templates/>

        </counts>

    </xsl:template>

    <xsl:template match="text()"/>

    <xsl:template match="*[contains(@class, 'map/map')]">

        <xsl:variable name="text">

            <xsl:apply-templates mode="getText" select="node()"/>

        </xsl:variable>

        <count>

            <xsl:value-of
                select="count(tokenize(lower-case($text), '(\s|[,.!:;]|[n][b][s][p][;])+')[string(.)])"/>

        </count>

        <xsl:apply-templates/>

    </xsl:template>

    <xsl:template match="*[contains(@class, 'map/map')]" mode="getText"/>

</xsl:stylesheet>
However, what I don't get is how I could insert the results in the first topic of the map, as a kind of infoblock.
Also, having the number of words is a good basis, but it should be necessary to proceed some calculation. I found this nice little js function which provides a rough estimation based on a word per minute.

Code: Select all

function readingTime() {
  const text = document.getElementById("article").innerText;
  const wpm = 225;
  const words = text.trim().split(/\s+/).length;
  const time = Math.ceil(words / wpm);
  document.getElementById("time").innerText = time;
}
readingTime();
Is there any way to apply such a function with xslt?
Has someone in the forum already met this need?

Any feedback is welcome. I recognize I was surprise nit seeing any plugin developed for that.
------
Gaspard
chrispitude
Posts: 907
Joined: Thu May 02, 2019 2:32 pm

Re: Display a reading duration

Post by chrispitude »

Hi gbv34,

Do you want this for PDF or HTML5 output, or both? This might influence how the implementation would work.
gbv34
Posts: 105
Joined: Thu Jan 20, 2022 12:36 pm

Re: Display a reading duration

Post by gbv34 »

Hi Chris,
I intend to achieve this feature for HTML files.
------
Gaspard
chrispitude
Posts: 907
Joined: Thu May 02, 2019 2:32 pm

Re: Display a reading duration

Post by chrispitude »

Hi gbv34,

How would you want it annotated to the HTML output? Is there a specific HTML structure you would want?

Maybe just as a proof of concept, I could put the information in an HTML <abstract> element?

- Chris
gbv34
Posts: 105
Joined: Thu Jan 20, 2022 12:36 pm

Re: Display a reading duration

Post by gbv34 »

Hi again :)
Actually, I don't have a specific requirement for that but I thought it would be more relevant to place it close to the main heading in a topic, right after the opening of the <body> element.
------
Gaspard
chrispitude
Posts: 907
Joined: Thu May 02, 2019 2:32 pm

Re: Display a reading duration

Post by chrispitude »

Hi gbv34,

For HTML-based transformations (html5 and webhelp-responsive), I think I could add a reading duration to each individual topic's page. But, I don't know how to compute the duration for the entire map. There is no place in these transformations where all of the map content is easily available from a single XSLT template. It could perhaps be achieved by accessing all the topic files as external documents from a map template, but that is not something I readily know how to do.

For the PDF Chemistry transformation (pdf-css-html5), I might be able to make it work for the entire map.

Let me know what sounds interesting to you!
gbv34
Posts: 105
Joined: Thu Jan 20, 2022 12:36 pm

Re: Display a reading duration

Post by gbv34 »

Hi Chris,
In my case, I am actually looking to display the reading time at the topic level. If you already have a snippet available, that would be awesome ;)
------
Gaspard
Radu
Posts: 9045
Joined: Fri Jul 09, 2004 5:18 pm

Re: Display a reading duration

Post by Radu »

Hi,
I added an extra <div> to the stylesheet I used to customize the prolog section for the Oxygen XML Blog:
https://github.com/oxygenxml/blog/blob/ ... prolog.xsl
It looks like this:

Code: Select all

        <div style="color: gray;">
            <xsl:variable name="fileContent" select="/"/>
            <xsl:variable name="text" select="normalize-space($fileContent)"/> 
            <xsl:variable name="textWithoutSpaces" select="translate($fileContent, ' ', '')" /> 
            <xsl:variable name="fileCountWords" select="string-length($text) - string-length($textWithoutSpaces) +1"/>
            <xsl:variable name="readMin" select="format-number($fileCountWords div 50, '0')"/>
            Read time: <xsl:value-of select="$readMin"/> minute(s)
        </div>
As you are matching the topic/body element you can define the file content variable like:

Code: Select all

 <xsl:variable name="fileContent" select="."/>
to obtain the entire content for the current body.
In a way I'm skeptical showing the read time for technical documentation content is that useful, in the example above I divided the number of words with "50", on the net you will find it's advised to divide it with 225. But technical documentation is not a romance novel, people need to think about what they are reading and apply it to their situation.

Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
chrispitude
Posts: 907
Joined: Thu May 02, 2019 2:32 pm

Re: Display a reading duration

Post by chrispitude »

Hi Radu,

I took a different approach of computing the word count using a moded template, so that I could force certain elements to be skipped. And although I don't do this, other element-specific heuristics could be applied, such applying different treatment to code blocks (such as counting lines, or dividing the "words" by some ratio).

Here is my approach, adapted into the blog XSLT file:

Code: Select all

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="xs" version="2.0">

    <xsl:template match="*[contains(@class, ' topic/prolog ')]">
        <!-- Display the author name -->
        <xsl:variable name="avatar-author" select="replace(*[contains(@class, ' topic/author ')],' ','_')"/>
        <div class="author inPage {$avatar-author}">
            <a href="/topics/contributors.html">
                <xsl:value-of select="*[contains(@class, ' topic/author ')]"/>
            </a>
        </div>
        <!-- Display the creation date -->
        <xsl:if test="exists(.//*[contains(@class, ' topic/created ')]/@date)">
            <div class="date inPage">
                <xsl:variable name="cd" select=".//*[contains(@class, ' topic/created ')]/@date"/>
                <xsl:value-of select="format-date(xs:date($cd),
                    '[D] [MNn,3-3] [Y0001]')"/>
            </div>
        </xsl:if>
        <!-- Display the number of minutes it takes to read the article -->
        <div style="color: gray;">
            <xsl:variable name="word-count" as="xs:integer">
                <xsl:apply-templates select=".." mode="word-count"/>  <!-- apply to enclosing topic -->
            </xsl:variable>
            <xsl:variable name="minutes" select="xs:integer(ceiling($word-count div $words-per-minute))" as="xs:integer"/>
            Read time: <xsl:value-of select="' %d %s'
                => replace('%d', string($minutes))
                => replace('%s', if ($minutes = 1) then 'minute' else 'minutes')"/>
        </div>
        <xsl:next-match/>
    </xsl:template>

    <!-- define a moded template used to compute the word count -->
    <xsl:variable name="words-per-minute" as="xs:integer" select="50"/>

    <!-- for text() nodes, simply count the words -->
    <xsl:template match="text()" mode="word-count" as="xs:integer">
        <xsl:value-of select="count(tokenize(normalize-space(.), '\s+'))"/>
    </xsl:template>

    <!-- for elements, sum the child-node word counts -->
    <xsl:template match="*" mode="word-count">
        <xsl:variable name="counts" as="xs:integer*">
            <xsl:apply-templates select="node()" mode="#current"/>
        </xsl:variable>
        <xsl:value-of select="sum($counts)"/>
    </xsl:template>

    <!-- do not let these elements contribute to word count -->
    <xsl:template match="@*" mode="word-count"/>
    <xsl:template match="*[contains-token(@class, 'topic/titlealts')]" mode="word-count"/>
    <xsl:template match="*[contains-token(@class, 'topic/prolog')]" mode="word-count"/>
    <xsl:template match="*[contains-token(@class, 'topic/indexterm')]" mode="word-count"/>
    <xsl:template match="*[contains-token(@class, 'topic/draft-comment')]" mode="word-count"/>

</xsl:stylesheet>
Radu
Posts: 9045
Joined: Fri Jul 09, 2004 5:18 pm

Re: Display a reading duration

Post by Radu »

Hi Chris,

Thanks for contributing to this.

Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
chrispitude
Posts: 907
Joined: Thu May 02, 2019 2:32 pm

Re: Display a reading duration

Post by chrispitude »

Hi Radu,

Should I submit a blog PR with this approach? Or would you prefer to keep the current approach?
Radu
Posts: 9045
Joined: Fri Jul 09, 2004 5:18 pm

Re: Display a reading duration

Post by Radu »

Hi Chris,
The XSLT in the current blog post is simpler, indeed it does not skip over certain elements but usually a prolog does not have much text content so should not influence the read time much, codeblocks which contain words should probably be contributed as well. Indeed draft comments make sense to be skipped although they might not be present anymore in the temporary processed DITA topics if the args.draft parameter is disabled. And if it not disabled, then the draft comment makes it into the HTML file and needs to be read as well.

Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
Post Reply