Modify webhelp-responsive index

Post here questions and problems related to editing and publishing DITA content.
qualler
Posts: 15
Joined: Fri Jan 26, 2018 8:05 pm

Modify webhelp-responsive index

Post by qualler »

Hi, I'm looking at https://www.oxygenxml.com/doc/versions/ ... rview.html

Which explains that:

whr-search-index
Processes the generated HTML (for all DITA topics) to generate an index file. This index is used to implement the WebHelp search function.

is the last step in the publishing process. I was searching the help and couldn't find how to customize the index to include elements we've added on the page, such as the root element id and the CCMS guid for traceability purposes.

How can this indexing be modified? :D

Justin
radu_pisoi
Posts: 403
Joined: Thu Aug 21, 2003 11:36 am
Location: Craiova
Contact:

Re: Modify webhelp-responsive index

Post by radu_pisoi »

Hi,

Unfortunately, there is no extension point that allow you to modify the search index.

I have added an improvement request in our internal issue tracker to add a parameter that allow you to set additional elements and attributes to be indexed.
Radu Pisoi
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com
steinbacherGE
Posts: 55
Joined: Tue Mar 13, 2018 6:07 pm

Re: Modify webhelp-responsive index

Post by steinbacherGE »

I'm working with Justin on this. Here's a screenshot of a sample HTML output file that contains the ID attribute values that we would like to be indexed so they show up as search results.

HTML output:
header-IDs.png
header-IDs.png (7.09 KiB) Viewed 3223 times
HTML source:
Topic&CCMS-ID.png
Topic&CCMS-ID.png (28.79 KiB) Viewed 3223 times
Is there currently anyway pull in these values so they show up in search results?

Thanks,

Leroy Steinbacher
beniamin_savu
Posts: 30
Joined: Fri Jan 22, 2021 11:05 am

Re: Modify webhelp-responsive index

Post by beniamin_savu »

Hi,

The id value is displayed in the content area, so it should be indexed by the search engine. From these images we don't see any problem.

For further investigations, it would be very useful to create a minimal valid sample project (DITA Map + topic) that you can send us for analysis. You can send the files to support@oxygenxml.com. Also, please specify the WebHelp version.

Best reagards,
Beniamin Savu
Oxygen WebHelp Team
http://www.oxygenxml.com
steinbacherGE
Posts: 55
Joined: Tue Mar 13, 2018 6:07 pm

Re: Modify webhelp-responsive index

Post by steinbacherGE »

We have not been able to attach samples by email, so we setup a shared folder for support@oxygenxml.com.

I've just sent an email with the details. Let us know if you are unable to access the files.

Thanks,

Leroy Steinbacher
steinbacherGE
Posts: 55
Joined: Tue Mar 13, 2018 6:07 pm

Re: Modify webhelp-responsive index

Post by steinbacherGE »

Thanks for the email with the fix. I thought I would share it here in case anyone else has this issue.

We noticed that your customization adds the section which contains the topic id in a <header> HTML element. This element is excluded from the search index. We have a file named search.index.elements.to.exclude.txt located in the DITA-OT-DIR/plugins/com.oxygenxml.webhelp.responsive/oxygen-webhelp/ folder which specifies the list of HTML elements that will not be indexed by the search engine:
div.ignore,nav.wh_tools,footer,header,div.wh_publication_toc,...

To fix the problem simply modify the search.index.elements.to.exclude.txt file and remove the header from the list. So the file should look like this:
div.ignore,nav.wh_tools,footer,div.wh_publication_toc,...
beniamin_savu
Posts: 30
Joined: Fri Jan 22, 2021 11:05 am

Re: Modify webhelp-responsive index

Post by beniamin_savu »

Hi,

We wanted to let you know that we released a new maintenance build for WebHelp 23.1 and it contains the fix for the problem you had with the search engine not indexing the text inside a <header> HTML element.

Best reagards,
Beniamin Savu
Oxygen WebHelp Team
http://www.oxygenxml.com
steinbacherGE
Posts: 55
Joined: Tue Mar 13, 2018 6:07 pm

Re: Modify webhelp-responsive index

Post by steinbacherGE »

I think in your latest build, this change was reverted. I'm still seeing header in the search.index.elements.to.exclude.txt.

com.oxygenxml.webhelp.responsive\oxygen-webhelp\search.index.elements.to.exclude.txt

Code: Select all

div.ignore,nav.wh_tools,footer,header.wh_header,div.wh_publication_toc,div.wh_topic_toc,div.wh_child_links,div.related_link,div.wh_copyright_information,a.sr-only-focusable,span.search_input_text
Is there something we can do in our custom plugin to override the values in this file?

Thanks,

Leroy
steinbacherGE
Posts: 55
Joined: Tue Mar 13, 2018 6:07 pm

Re: Modify webhelp-responsive index

Post by steinbacherGE »

steinbacherGE wrote: Wed Jul 21, 2021 9:12 pm I think in your latest build, this change was reverted. I'm still seeing header in the search.index.elements.to.exclude.txt.
Nevermind. I think this file is OK. I found a different reason why our search for doc IDs was not working.

Thanks,

Leroy
steinbacherGE
Posts: 55
Joined: Tue Mar 13, 2018 6:07 pm

Re: Modify webhelp-responsive index

Post by steinbacherGE »

The search for document IDs in the header is not working for translated Japanese output. We were getting the error message described in this topic.

Note that the parameter mentioned in this topic is missing ".js" at the end. I added feedback to report this.

https://www.oxygenxml.com/doc/versions/ ... bhelp.html

I followed the process to turn off the Kuromoji analyzer and I'm still getting the same result when searching for IDs in the topic header.

Any suggestions?

I can update the sample that we previously provided with Japanese DITA if needed, or just change the xml:lang values on the given files to test.

Thanks,

Leroy Steinbacher
beniamin_savu
Posts: 30
Joined: Fri Jan 22, 2021 11:05 am

Re: Modify webhelp-responsive index

Post by beniamin_savu »

Hi,

At transformation time we index the content from all the HTML files generated from the DITA files using the Lucene Kuromoji morphological analyzer. When you use the WebHelp search page, we use kuromoji.js (https://github.com/takuyaa/kuromoji.js/) to break the search query into tokens and interrogate the index (the one created at transformation time) to give you the result for the search query.

The webhelp.enable.search.kuromoji.js parameter only disables/enables the kuromoji.js from the WebHelp search page. If the parameter is set to 'no', the content will still be indexed by the Lucene Kuromoji morphological analyzer, but the search query will not be processed by kuromoji.js. Without kuromoji.js the WebHelp search engine may give incorrect results.

Unfortunately kuromoji.js does not work if your WebHelp output is accessed locally. We suggest to publish your WebHelp output on a web server. We tested the sample that you gave us on a web server with the webhelp.enable.search.kuromoji.js set to 'yes', and the search seems to find the document IDs in the header.

Best regards,
Beniamin Savu
Oxygen WebHelp Team
http://www.oxygenxml.com
Post Reply