Webhelp search doesn't find term that is present on html-page

Post here questions and problems related to editing and publishing DITA content.
mbur1
Posts: 1
Joined: Tue Oct 25, 2022 5:22 pm

Webhelp search doesn't find term that is present on html-page

Post by mbur1 »

We're transforming our dita files and maps to the webhelp format which and the result is just as intended. However, we noticed something peculiar when using the search function:
When searching for farbe achsen, a page that actually contains the term achsen shows as follows in the results (in the red box it says that the term is missing on that page)
25-10-_2022_16-29-01.png
25-10-_2022_16-29-01.png (10.54 KiB) Viewed 889 times
However, it is there.
25-10-_2022_16-29-58.png
25-10-_2022_16-29-58.png (28.35 KiB) Viewed 889 times
It's like the term can't be found as a substring on this page. On other pages, however, the two search terms can be found as a string and substring.
Does somebody experience the same behavior and knows how to solve it?
Attachments
25-10-_2022_16-29-01.png
25-10-_2022_16-29-01.png (10.54 KiB) Viewed 889 times
Costin
Posts: 833
Joined: Mon Dec 05, 2011 6:04 pm

Re: Webhelp search doesn't find term that is present on html-page

Post by Costin »

Hi mbur1,

The issue seems to reside in the incorrect language the content in your DITA Map is indexed by the WebHelp indexer.
For the indexer to consider terms in German language, you should explicitly set the language for your DITA Map to German.
To do that, you could either do it manually, by adding the "xml:lang" attribute on your DITA Map's root element and set its value to "de" or "de-DE", or, if you run the transformation from a GUI-based software (like Editor or Author) set the value of the dedicated transformation scenario parameter "default.language" (you can find the parameter if you edit the WebHelp transformation scenario you are using and look for it in the "Parameters" tab).

If you still encounter indexing issues, even after you explicitly set the German language on your DITA Map, to investigate, you should send a complete DITA Map files hierarchy in an as minimal form as possible (the DITA Map with a few topics) on support@oxygenxml.com and we will look into it.

Kind Regards,
Costin
Costin Sandoi
oXygen XML Editor and Author Support
galanohan
Posts: 115
Joined: Mon Jul 10, 2023 11:49 am

Re: Webhelp search doesn't find term that is present on html-page

Post by galanohan »

Hi Costin,

I was experiencing the same or maybe similar issue as this one, then I took a search against "search result" in the forum, that's why I'm here.
企业微信截图_17047798507386.png
企业微信截图_17047798507386.png (583.86 KiB) Viewed 492 times
For example, in a webhelp where xml:lang was set as "zh" at the map level and bookmap level (90% of the characters are Chinese), however 10% of the characters are English or latin characters, for example, API, plugin name, function name, parameter name, configuration items, code blocks, etc.

In the screenshot above, I was trying to search Python API, and we do have a document with such title, but in the search results, this document was ranked 6th instead of 1st. From the search results, it seems the search was carried out under the or operator (which is true and that's how I specified in the opt file), and search results were given by joining the search results for "Python" and "API", which is:

searchResult1 = search "Python";
searchResult2 = search "API"
search result = searchResult1+searchResult2

I'm not sure if I was correct about this.

So for a workaround, should I modify the search operator from "or" to "and" to force the built-in search join two or more words separated by space into one string, like, search for A B C = search for "ABC" or "A32B32C" (in which 32 is the decimal value for space in ASCII table)?
alin
Site Admin
Posts: 268
Joined: Thu Dec 24, 2009 11:21 am

Re: Webhelp search doesn't find term that is present on html-page

Post by alin »

Hello,

Search results are ranked based on a complex algorithm used by the search engine to determine the relevance of each page (topic) to the user's search query.

The search engine computes scores for every topic that matches the search criteria and uses this score to sort the search results.
The search rank of a page depends on the location and the number of occurrences of the searched terms in the content.
The search ranking order, sorted by relevance is determined by the following locations:
  • Page Title, Keywords & index terms
  • Short description and section headings (H1 to H6)
  • Bold text
  • Italic & underlined text
  • Plain text
Note that even if a page contains an occurrence of the searched term in the page title it is possible to be surpassed in the list of the search results by another page that contains many occurrences of the same term in the plain text. In the end, what matters is the total score calculated for each page, which is determined by adding up the scores of each occurrence of the searched term.

The <indexterm> and <keywords> DITA elements are an effective way to increase the ranking of a page.
The terms found in these elements add a lot of weight to the current page in the list of search results.

Regards,
Alin
Alin Balasa
Software Developer
<oXygen/> XML Editor
http://www.oxygenxml.com
galanohan
Posts: 115
Joined: Mon Jul 10, 2023 11:49 am

Re: Webhelp search doesn't find term that is present on html-page

Post by galanohan »

Hi Alin

Thanks for the reply. I backed up the default scoring.properties file under \Oxygen XML Editor 26\frameworks\dita\DITA-OT\plugins\com.oxygenxml.webhelp.responsive\indexer then modified the weight for each itmes as follows:

h1 = 50
h2 = 40
h3 = 30
h4 = 20
h5 = 10
h6 = 10
b = 10
strong = 5
em = 5
i = 5
u = 5
div.toc = 10
title = 50
div.ignore = ignored
meta_keywords = 1
meta_indexterms = 1
meta_description = 1
shortdesc = 10

As you can see, my intention was to increase the matching rate for search terms in titles from h1 to 6 and page title (or the root topic title if it's a file contains nested topics). Then I gave it a try with this scoring file and generate a webhelp. The ranking for certain specific search item didn't go up as I expected.

Also, I set the weight for keyword and indexterms to very small value because in the content of my map and files, I didn't added lots of topicmeta info for various topics, not much at all. Is there a way to add meta keywords or index terms in one batch using topic file names or page title?
Post Reply