Extend stop-words-list (for Webhelp search)

Post here questions and problems related to editing and publishing DITA content.
Silke
Posts: 10
Joined: Wed Mar 30, 2016 5:02 pm

Extend stop-words-list (for Webhelp search)

Post by Silke »

Hi,

ist there a possibility to extend the stop-word-list for Webhelp output -> index-1.js?

I searched for a example from the stoplist: "anderr".
- my content does not contain that string.
- the oxygen folder contains "anderr" in the de-DE.dic in words like "Wanderrucksack".

In the webhelp-plugin folder ist a file: de_words.properties which looks as if the stop words do come from, but "anderr" is not in there?

Thank you in advance
Silke
bogdan_cercelaru
Posts: 222
Joined: Tue Jul 01, 2014 11:48 am

Re: Extend stop-words-list (for Webhelp search)

Post by bogdan_cercelaru »

Hello,

Unfortunately you cannot change the stop-words-list using the current implementation of the WebHelp.
I have registered your request in our issue tracking system to be analyzed.

Regards,
Bogdan
Bogdan Cercelaru
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com
ckabstein
Posts: 142
Joined: Fri Apr 24, 2015 12:28 pm

Re: Extend stop-words-list (for Webhelp search)

Post by ckabstein »

Hi,

Given the fact that the new webhelp search now does the following:

"Always search for words containing three or more characters (shorter words, such as to or of are ignored). This rule does not apply to CJK (Chinese, Japanese, Korean) languages."

and

"To improve performance, the Search feature excludes certain stop words. For example, the English version of such stop words include: a, an, and, are, as, at, be, but, by, for, if, in, into, is, it, no, not, of, on, or, such, that, the, their, then, there, these, they, this, to, was, will, with."

I would be interested to know if it's now possible to
a) extend or reduce the list of stop words and where we could do this.
b) add more lists for other languages.

And finally, which rules apply to CJK languages exactly? And where are these set?

Thanks,
Christina
oXygen XML Editor 25.0 build 2023013006
DITA OT 3.7.3
ionela
Posts: 402
Joined: Mon Dec 05, 2011 6:08 pm

Re: Extend stop-words-list (for Webhelp search)

Post by ionela »

Hi Christina,

Unfortunately, the stop-words-list is not configurable.
The stop words are computed dynamically depending on the language you have chosen when you publish your documentation. They are computed by the search indexer and written in the out/webhelp-responsive/oxygen-webhelp/search/index-1.js file:

We have also discussed about this on the following topic from our forum:
post42687.html#p42687

Regards,
Ionela
Ionela Istodor
oXygen XML Editor and Author Support
ckabstein
Posts: 142
Joined: Fri Apr 24, 2015 12:28 pm

Re: Extend stop-words-list (for Webhelp search)

Post by ckabstein »

Hi Ionela,

Sorry, I didn't see that post. Thanks for pointing me towards that one.

Best,
Christina
oXygen XML Editor 25.0 build 2023013006
DITA OT 3.7.3
Post Reply