WebHelp Responsive search: How do "Stop Words" work?

B-E-N
Posts: 139
Joined: Thu Mar 24, 2016 5:54 pm

WebHelp Responsive search: How do "Stop Words" work?

Post by B-E-N » Wed May 03, 2017 5:34 pm

Hello,

first of all thank you for the new search capabilities in Oxygen 19.

We are currently translating the new search strings in our various languages. Two strings mention the "stop words", such as "of", "the", and "by".

How does this work in other languages? I can see that you have translated them into Spanish, for example. How should we proceed if we would like to add Russian, for example? Is there a way to add or remove stop words?

Thanks,

Benjamin

B-E-N
Posts: 139
Joined: Thu Mar 24, 2016 5:54 pm

Re: WebHelp Responsive search: How do "Stop Words" work?

Post by B-E-N » Thu May 04, 2017 12:35 pm

Correction: I've just realized that the Spanish translation was done by a colleague of mine and not by you. So the more general question: How should we deal with translating the strings in the WebHelp search?

radu_pisoi
Posts: 389
Joined: Thu Aug 21, 2003 11:36 am
Location: Craiova
Contact:

Re: WebHelp Responsive search: How do "Stop Words" work?

Post by radu_pisoi » Thu May 04, 2017 2:37 pm

Hi,

The procedure for localizing the WebHelp output is described in our user manual in the Localizing the Interface of WebHelp Output (for DITA Map Transformations) topic.
We are currently translating the new search strings in our various languages. Two strings mention the "stop words", such as "of", "the", and "by".
Do you need the context where these strings are used? If yes, could you tell us which are the strings you need additional information?
Radu Pisoi
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com

B-E-N
Posts: 139
Joined: Thu Mar 24, 2016 5:54 pm

Re: WebHelp Responsive search: How do "Stop Words" work?

Post by B-E-N » Thu May 04, 2017 4:08 pm

Thanks for your answer.

We know how to localize the WebHelp output, I mean something different here.

The search considers some words as so called "stop words". This means, they are not considered when searching for terms. There are two strings that mention stop words:

Code: Select all

No results were found because the search query only contains <span>stop words</span> that are excluded by the search engine.

Code: Select all

Stop words are very common words or adjectives that hinder search efforts. Words such as: &apos;of&apos;, &apos;the&apos;, &apos;by&apos;, etc.
We must translate those strings into our target languages (Spanish, French, Japanese, Russian, etc.).

The question now is: What do we do with the stop words (of, the, by,...)? Just because we translate them, doesn't mean that the search actually ignores them in other languages.

How does the search know, which words are stop words? And can we add stop words for other languages as well?

radu_pisoi
Posts: 389
Joined: Thu Aug 21, 2003 11:36 am
Location: Craiova
Contact:

Re: WebHelp Responsive search: How do "Stop Words" work?

Post by radu_pisoi » Thu May 04, 2017 10:05 pm

Hi,

The stop words are computed dynamically depending on the language you have chosen when you publish your documentation. They are computed by the search indexer and written in the out/webhelp-responsive/oxygen-webhelp/search/index-1.js file:

Code: Select all

stopWords = new Array();
stopWords[0]= "but";
stopWords[1]= "be";
stopWords[2]= "with";
stopWords[3]= "such";
....
So, if you want to be sure which are the stop words for a certain language, you need to inspect the index-1.js file.

There is no parameter to control the stop words.
Radu Pisoi
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com

B-E-N
Posts: 139
Joined: Thu Mar 24, 2016 5:54 pm

Re: WebHelp Responsive search: How do "Stop Words" work?

Post by B-E-N » Fri May 05, 2017 11:33 am

Thank you, that helps a lot.

Gertone
Posts: 17
Joined: Mon Sep 17, 2007 10:02 am
Location: Flanders

Re: WebHelp Responsive search: How do "Stop Words" work?

Post by Gertone » Sun Jan 10, 2021 4:14 pm

Hi,

I do realise I am reviving a pretty old thread...

In v23 I looked for the index-1.js
but realized that the array construction has moved to
...\oxygen-webhelp\app\search\index\stopwords.js

Code: Select all

define(function() {
// Auto generated list of analyzer stop words that must be ignored by search.
return ["but","be","with","such","then","for","no","will","not","are","and","their","if","this","on","into","a","or","there","in","that","they","was","is","it","an","the","as","at","these","by","to","of"];
});
Does that imply that we can now influence the stop words?

I guess I could swap that file with a project/language dependent function, either manually or through a plugin change,
but doing it from the configuration of the customization would be my preferred path.

Thanks for other suggestions

Geert Bormans

radu_pisoi
Posts: 389
Joined: Thu Aug 21, 2003 11:36 am
Location: Craiova
Contact:

Re: WebHelp Responsive search: How do "Stop Words" work?

Post by radu_pisoi » Mon Jan 11, 2021 11:18 am

Hi,

Starting with version 23, you can customize the stop words list by using the following two parameters: webhelp.search.stop.words.exclude and webhelp.search.stop.words.include. They allow you to exclude/include custom stop words.

Please see the WebHelp Responsive Transformation Parameters topic in WebHelp documentation for more details.
Radu Pisoi
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com

Gertone
Posts: 17
Joined: Mon Sep 17, 2007 10:02 am
Location: Flanders

Re: WebHelp Responsive search: How do "Stop Words" work?

Post by Gertone » Mon Jan 11, 2021 4:27 pm

Hi Radu,

Thanks for pointing me to right place in the manual
(and thank you Oxygen for adding that functionality)

I assume this can not be made language dependent other than add all languages in one parameter?
Anyhow, the functionality is extremely useful as it is already

Thanks,

Geert

radu_pisoi
Posts: 389
Joined: Thu Aug 21, 2003 11:36 am
Location: Craiova
Contact:

Re: WebHelp Responsive search: How do "Stop Words" work?

Post by radu_pisoi » Wed Jan 13, 2021 11:43 am

Hi,
Gertone wrote:
Mon Jan 11, 2021 4:27 pm
I assume this can not be made language dependent other than add all languages in one parameter?
No, you should update exclude/include stop words parameters depending on the current language.
Radu Pisoi
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com

Post Reply