Weighting Index Terms for WebHelp Search

Here should go questions about transforming XML with XSLT and FOP.
shanet
Posts: 6
Joined: Thu Mar 17, 2011 5:05 pm

Weighting Index Terms for WebHelp Search

Post by shanet » Sun Mar 27, 2011 7:03 pm

When we create WebHelp from our DITA source files, the transform uses some java code to create the search index. It's a nice feature, and the javascript in the WebHelp gives us quick search results.

Unfortunately, customers and key internal stakehlders have complained about the ranking of search results. We can affect this somewhat by adding indexterm metadata to the topics, but the weight given indexterms is sometimes overridden by mere word frequency in other topics.

In order to improve the ranking of search results, we'd like to be able to give author-supplied index terms and heading levels a good deal more weight than mere term frequency in a topic.

Are there any parameters we can pass to the indexer task to accomplish this? Or any workaround by modifying the java code?

TIA,
Shane

Radu
Posts: 6435
Joined: Fri Jul 09, 2004 5:18 pm

Re: Weighting Index Terms for WebHelp Search

Post by Radu » Mon Mar 28, 2011 10:33 am

Dear Shane,

Unfortunately the DITA Open Toolkit does not have special handling for index terms in HTML output. They usually are not present at all in the generated HTML.
The only exception for this is when the index term is defined in the first paragraph and appears in the generated <meta name="description" ... content.

The Java indexer we are using to generate the Javascript search and scoring is applied directly on the generated HTML output and thus usually has no information that a certain word was considered as an index term in the DITA content.

I added an improvement request for a future version to somehow pass all index terms to the final HTML output in a special <meta...> tag and give words which appear in that tag higher scoring.

In the meantime Oxygen assigns larger scoring to words which appear in titles so you could maybe have such words appear more often in the titles.

Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com

Radu
Posts: 6435
Joined: Fri Jul 09, 2004 5:18 pm

Re: Weighting Index Terms for WebHelp Search

Post by Radu » Tue Mar 29, 2011 10:34 am

Hi,

Usually the correct way in DITA to specify key words is like:

Code: Select all


.................
<topic id="topic-1">
<title>Lilac</title>
<prolog>
<metadata>
<keywords>
<keyword>Lilac</keyword>
</keywords>
</metadata>
</prolog>
<body>
.........................
In the generated HTML the keywords are generated like:

<meta name="keywords" content="Lilac"/>

The Oxygen indexer takes them into consideration but it should probably consider them more important then in the current implementation.

Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com

daveg
Posts: 11
Joined: Fri Oct 15, 2010 11:48 am

Re: Weighting Index Terms for WebHelp Search

Post by daveg » Wed Jun 08, 2011 8:15 pm

I'm also having trouble with the Webhelp search results.
I've followed the advice you gave here but it's not seemed to change the order or score of the results.

It's currently scoring a topic with a single reference to the search term ahead of a topic with the search term in both the title and the first paragraph.

It does not seem to order by frequency or by any other logic i can think of.

I'd like the search results to return the topics that match the search term in the title first.

Any help with this would be great.

Thanks,
Dave

Radu
Posts: 6435
Joined: Fri Jul 09, 2004 5:18 pm

Re: Weighting Index Terms for WebHelp Search

Post by Radu » Thu Jun 09, 2011 12:57 pm

Hi Dave,

The words in the title should have a higher weight when searched for.
Can you post two small DITA topic XML samples on which the problem can be reproduced?
You can also send us the samples as a zip archive at our usual email address support at oxygenxml dot com.

Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com

Post Reply