Page 1 of 1

Weighting Index Terms for WebHelp Search

Posted: Sun Mar 27, 2011 7:03 pm
by shanet
When we create WebHelp from our DITA source files, the transform uses some java code to create the search index. It's a nice feature, and the javascript in the WebHelp gives us quick search results.

Unfortunately, customers and key internal stakehlders have complained about the ranking of search results. We can affect this somewhat by adding indexterm metadata to the topics, but the weight given indexterms is sometimes overridden by mere word frequency in other topics.

In order to improve the ranking of search results, we'd like to be able to give author-supplied index terms and heading levels a good deal more weight than mere term frequency in a topic.

Are there any parameters we can pass to the indexer task to accomplish this? Or any workaround by modifying the java code?

TIA,

Re: Weighting Index Terms for WebHelp Search

Posted: Mon Mar 28, 2011 10:33 am
by Radu
Dear Shane,

Unfortunately the DITA Open Toolkit does not have special handling for index terms in HTML output. They usually are not present at all in the generated HTML.
The only exception for this is when the index term is defined in the first paragraph and appears in the generated <meta name="description" ... content.

The Java indexer we are using to generate the Javascript search and scoring is applied directly on the generated HTML output and thus usually has no information that a certain word was considered as an index term in the DITA content.

I added an improvement request for a future version to somehow pass all index terms to the final HTML output in a special <meta...> tag and give words which appear in that tag higher scoring.

In the meantime Oxygen assigns larger scoring to words which appear in titles so you could maybe have such words appear more often in the titles.

Regards,
Radu

Re: Weighting Index Terms for WebHelp Search

Posted: Tue Mar 29, 2011 10:34 am
by Radu
Hi,

Usually the correct way in DITA to specify key words is like:

Code: Select all


.................
<topic id="topic-1">
<title>Lilac</title>
<prolog>
<metadata>
<keywords>
<keyword>Lilac</keyword>
</keywords>
</metadata>
</prolog>
<body>
.........................
In the generated HTML the keywords are generated like:

<meta name="keywords" content="Lilac"/>

The Oxygen indexer takes them into consideration but it should probably consider them more important then in the current implementation.

Regards,
Radu

Re: Weighting Index Terms for WebHelp Search

Posted: Wed Jun 08, 2011 8:15 pm
by daveg
I'm also having trouble with the Webhelp search results.
I've followed the advice you gave here but it's not seemed to change the order or score of the results.

It's currently scoring a topic with a single reference to the search term ahead of a topic with the search term in both the title and the first paragraph.

It does not seem to order by frequency or by any other logic i can think of.

I'd like the search results to return the topics that match the search term in the title first.

Any help with this would be great.

Thanks,
Dave

Re: Weighting Index Terms for WebHelp Search

Posted: Thu Jun 09, 2011 12:57 pm
by Radu
Hi Dave,

The words in the title should have a higher weight when searched for.
Can you post two small DITA topic XML samples on which the problem can be reproduced?
You can also send us the samples as a zip archive at our usual email address support at oxygenxml dot com.

Regards,
Radu