WebHelp Responsive search: How do "Stop Words" work?

B-E-N
Posts: 94

WebHelp Responsive search: How do "Stop Words" work?

Wed May 03, 2017 5:34 pm

Hello,

first of all thank you for the new search capabilities in Oxygen 19.

We are currently translating the new search strings in our various languages. Two strings mention the "stop words", such as "of", "the", and "by".

How does this work in other languages? I can see that you have translated them into Spanish, for example. How should we proceed if we would like to add Russian, for example? Is there a way to add or remove stop words?

Thanks,

Benjamin
B-E-N
Posts: 94

Re: WebHelp Responsive search: How do "Stop Words" work?

Thu May 04, 2017 12:35 pm

Correction: I've just realized that the Spanish translation was done by a colleague of mine and not by you. So the more general question: How should we deal with translating the strings in the WebHelp search?
radu_pisoi
Posts: 310
Location: Craiova

Re: WebHelp Responsive search: How do "Stop Words" work?

Thu May 04, 2017 2:37 pm

Hi,

The procedure for localizing the WebHelp output is described in our user manual in the Localizing the Interface of WebHelp Output (for DITA Map Transformations) topic.

We are currently translating the new search strings in our various languages. Two strings mention the "stop words", such as "of", "the", and "by".

Do you need the context where these strings are used? If yes, could you tell us which are the strings you need additional information?
Radu Pisoi
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com
B-E-N
Posts: 94

Re: WebHelp Responsive search: How do "Stop Words" work?

Thu May 04, 2017 4:08 pm

Thanks for your answer.

We know how to localize the WebHelp output, I mean something different here.

The search considers some words as so called "stop words". This means, they are not considered when searching for terms. There are two strings that mention stop words:

Code: Select all

No results were found because the search query only contains &lt;span&gt;stop words&lt;/span&gt; that are excluded by the search engine.

Code: Select all

Stop words are very common words or adjectives that hinder search efforts. Words such as: &apos;of&apos;, &apos;the&apos;, &apos;by&apos;, etc.


We must translate those strings into our target languages (Spanish, French, Japanese, Russian, etc.).

The question now is: What do we do with the stop words (of, the, by,...)? Just because we translate them, doesn't mean that the search actually ignores them in other languages.

How does the search know, which words are stop words? And can we add stop words for other languages as well?
radu_pisoi
Posts: 310
Location: Craiova

Re: WebHelp Responsive search: How do "Stop Words" work?

Thu May 04, 2017 10:05 pm

Hi,

The stop words are computed dynamically depending on the language you have chosen when you publish your documentation. They are computed by the search indexer and written in the out/webhelp-responsive/oxygen-webhelp/search/index-1.js file:

Code: Select all

stopWords = new Array();
stopWords[0]= "but";
stopWords[1]= "be";
stopWords[2]= "with";
stopWords[3]= "such";
....

So, if you want to be sure which are the stop words for a certain language, you need to inspect the index-1.js file.

There is no parameter to control the stop words.
Radu Pisoi
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com
B-E-N
Posts: 94

Re: WebHelp Responsive search: How do "Stop Words" work?

Fri May 05, 2017 11:33 am

Thank you, that helps a lot.

Return to “DITA (Editing and Publishing DITA Content)”

Who is online

Users browsing this forum: Bing [Bot] and 1 guest