[oXygen-user] Feature request: Improvement of Japanese search for WebHelp
Support Oxygen XML Editor (Sorin Ristache)
Tue Apr 14 10:17:02 CDT 2015
On 4/12/2015 5:26 AM, T. Hatanaka wrote:
> That being said, it would still benefit a lot to integrate sophisticated analyzers, even if it's only at build-time. Run-time ones are less required, I guess. When the Japanese people search Web, they, human beings, usually perform a kind of tokenization and normalization by themselves.
> i.e. They do not usually enter "BROWNFOXJUMPS" in the search text box. In most cases we can expect them to type "BROWN FOX JUMP".
> Actually "Please enter keywords separated by spaces" has been a common instruction found on the Japanese search UI. People have got used to it.
Thank you for letting us know. In a future version we will integrate the
Kuromoji analyzer in our Apache Lucene customization that runs on the
generated WebHelp pages for building the WebHelp search index. This
index will offer relevant search result in the WebHelp pages only for
Japanese search terms entered in the browser that are properly separated
with space characters.
> Here's another piece of news:
> I haven't tried it, but expect some difficulties. I heard it required a 17MB dictionary.
tokenization of the search string entered by the user may take forever
separated search terms entered by the user, as you suggested above.
> T. Hatanaka
<oXygen/> XML Editor
More information about the oXygen-user