Weighting Index Terms for WebHelp Search
Here should go questions about transforming XML with XSLT and FOP.
-
- Posts: 6
- Joined: Thu Mar 17, 2011 5:05 pm
Weighting Index Terms for WebHelp Search
When we create WebHelp from our DITA source files, the transform uses some java code to create the search index. It's a nice feature, and the javascript in the WebHelp gives us quick search results.
Unfortunately, customers and key internal stakehlders have complained about the ranking of search results. We can affect this somewhat by adding indexterm metadata to the topics, but the weight given indexterms is sometimes overridden by mere word frequency in other topics.
In order to improve the ranking of search results, we'd like to be able to give author-supplied index terms and heading levels a good deal more weight than mere term frequency in a topic.
Are there any parameters we can pass to the indexer task to accomplish this? Or any workaround by modifying the java code?
TIA,
Unfortunately, customers and key internal stakehlders have complained about the ranking of search results. We can affect this somewhat by adding indexterm metadata to the topics, but the weight given indexterms is sometimes overridden by mere word frequency in other topics.
In order to improve the ranking of search results, we'd like to be able to give author-supplied index terms and heading levels a good deal more weight than mere term frequency in a topic.
Are there any parameters we can pass to the indexer task to accomplish this? Or any workaround by modifying the java code?
TIA,
Shane
-
- Posts: 9439
- Joined: Fri Jul 09, 2004 5:18 pm
Re: Weighting Index Terms for WebHelp Search
Dear Shane,
Unfortunately the DITA Open Toolkit does not have special handling for index terms in HTML output. They usually are not present at all in the generated HTML.
The only exception for this is when the index term is defined in the first paragraph and appears in the generated <meta name="description" ... content.
The Java indexer we are using to generate the Javascript search and scoring is applied directly on the generated HTML output and thus usually has no information that a certain word was considered as an index term in the DITA content.
I added an improvement request for a future version to somehow pass all index terms to the final HTML output in a special <meta...> tag and give words which appear in that tag higher scoring.
In the meantime Oxygen assigns larger scoring to words which appear in titles so you could maybe have such words appear more often in the titles.
Regards,
Radu
Unfortunately the DITA Open Toolkit does not have special handling for index terms in HTML output. They usually are not present at all in the generated HTML.
The only exception for this is when the index term is defined in the first paragraph and appears in the generated <meta name="description" ... content.
The Java indexer we are using to generate the Javascript search and scoring is applied directly on the generated HTML output and thus usually has no information that a certain word was considered as an index term in the DITA content.
I added an improvement request for a future version to somehow pass all index terms to the final HTML output in a special <meta...> tag and give words which appear in that tag higher scoring.
In the meantime Oxygen assigns larger scoring to words which appear in titles so you could maybe have such words appear more often in the titles.
Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
<oXygen/> XML Editor
http://www.oxygenxml.com
-
- Posts: 9439
- Joined: Fri Jul 09, 2004 5:18 pm
Re: Weighting Index Terms for WebHelp Search
Hi,
Usually the correct way in DITA to specify key words is like:
In the generated HTML the keywords are generated like:
<meta name="keywords" content="Lilac"/>
The Oxygen indexer takes them into consideration but it should probably consider them more important then in the current implementation.
Regards,
Radu
Usually the correct way in DITA to specify key words is like:
Code: Select all
.................
<topic id="topic-1">
<title>Lilac</title>
<prolog>
<metadata>
<keywords>
<keyword>Lilac</keyword>
</keywords>
</metadata>
</prolog>
<body>
.........................
<meta name="keywords" content="Lilac"/>
The Oxygen indexer takes them into consideration but it should probably consider them more important then in the current implementation.
Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
<oXygen/> XML Editor
http://www.oxygenxml.com
-
- Posts: 11
- Joined: Fri Oct 15, 2010 11:48 am
Re: Weighting Index Terms for WebHelp Search
I'm also having trouble with the Webhelp search results.
I've followed the advice you gave here but it's not seemed to change the order or score of the results.
It's currently scoring a topic with a single reference to the search term ahead of a topic with the search term in both the title and the first paragraph.
It does not seem to order by frequency or by any other logic i can think of.
I'd like the search results to return the topics that match the search term in the title first.
Any help with this would be great.
Thanks,
Dave
I've followed the advice you gave here but it's not seemed to change the order or score of the results.
It's currently scoring a topic with a single reference to the search term ahead of a topic with the search term in both the title and the first paragraph.
It does not seem to order by frequency or by any other logic i can think of.
I'd like the search results to return the topics that match the search term in the title first.
Any help with this would be great.
Thanks,
Dave
-
- Posts: 9439
- Joined: Fri Jul 09, 2004 5:18 pm
Re: Weighting Index Terms for WebHelp Search
Hi Dave,
The words in the title should have a higher weight when searched for.
Can you post two small DITA topic XML samples on which the problem can be reproduced?
You can also send us the samples as a zip archive at our usual email address support at oxygenxml dot com.
Regards,
Radu
The words in the title should have a higher weight when searched for.
Can you post two small DITA topic XML samples on which the problem can be reproduced?
You can also send us the samples as a zip archive at our usual email address support at oxygenxml dot com.
Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
<oXygen/> XML Editor
http://www.oxygenxml.com
Jump to
- Oxygen XML Editor/Author/Developer
- ↳ Feature Request
- ↳ Common Problems
- ↳ DITA (Editing and Publishing DITA Content)
- ↳ SDK-API, Frameworks - Document Types
- ↳ DocBook
- ↳ TEI
- ↳ XHTML
- ↳ Other Issues
- Oxygen XML Web Author
- ↳ Feature Request
- ↳ Common Problems
- Oxygen Content Fusion
- ↳ Feature Request
- ↳ Common Problems
- Oxygen JSON Editor
- ↳ Feature Request
- ↳ Common Problems
- Oxygen PDF Chemistry
- ↳ Feature Request
- ↳ Common Problems
- Oxygen Feedback
- ↳ Feature Request
- ↳ Common Problems
- Oxygen XML WebHelp
- ↳ Feature Request
- ↳ Common Problems
- XML
- ↳ General XML Questions
- ↳ XSLT and FOP
- ↳ XML Schemas
- ↳ XQuery
- NVDL
- ↳ General NVDL Issues
- ↳ oNVDL Related Issues
- XML Services Market
- ↳ Offer a Service