[oXygen-user] Feature request: Improvement of Japanese search for WebHelp
Sorin Ristache
sorin at sync.ro
Tue Nov 19 03:08:55 CST 2013
Dear Naoki-san,
The Webhelp content indexer is indeed based on the Lucene engine just
like the Kuromoji morphological analyzer, so delegating the task of
indexing any Japanese content at build time (when the Webhelp pages are
created by the Oxygen Webhelp transformation) to the Kuromoji analyzer
is doable. However the Webhelp search is performed at runtime on the
client side, with JavaScript code running on the machine where the
Webhelp search is executed in the browser, not on the server side, where
the Webhelp pages are stored. The difficulty in integrating an analyzer
that deals with a specific language sentence morphology like the
Kuromoji analyzer comes from the lack of an equivalent JavaScript
analyzer that is able to split the search string entered by the user
into the morphological components recognized by the Lucene-based
morphological analyzer that built the index database at build time.
I did a Google search but I could not identify a client side JavaScript
solution for a Japanese morphological analyzer. If you can suggest such
a solution we would surely consider it as a future improvement for the
Webhelp search.
Kind regards,
Sorin
Naoki Hirai wrote:
> Hi,
>
>
>
> I like Oxygen WebHelp very much and recommend it to Japanese users. The
> WebHelp is sophisticated online manual solution, but one issue has still
> remained for Japanese users. That is a Japanese search. For Japanese
> it's difficult to extract words from sentences. Because the words are
> not separated by spaces. Therefore, in general, a morphological analyzer
> is used to extract the words from the sentences. Recently, an open
> source Japanese morphological analyzer which is called "Kuromoji" has
> become popular. The Apache Solr has introduced Kuromoji as the
> morphological analyzer.
>
>
>
> So, my feature request is that Oxygen WebHelp plug-in will incorporate
> Kuromoji as the morphological analyzer. And add a parameter which
> selects a stemmer for generating a WebHelp output. I can help the
> development and the evaluation.
>
>
>
> Please have a thought.
>
>
>
> Best regards,
>
> Naoki
More information about the oXygen-user
mailing list