Page 1 of 1

Index Sorting for Japanese

Posted: Thu Jun 16, 2016 2:37 pm
by Anonymous1
Hello,

at the moment, we are publishing to PDF and HTML help. For the future, we'd also like to provide a webhelp for our documentation.

For Japanese, the index and glossary terms must be sorted according to their language rules (Hiragana, Katakana). We add a <sort-as> element to each index term and glossary term, and the build process automatically produces a correct output. So this already works fine for the PDF output via Apache FOP.

Does the webhelp output also sort the terms correctly? The only comment I could find regarding this topic was in a blog, where the author said "no".

In the documentation, the only Japanese parameter for webhelp was this one: webhelp.search.japanese.dictionary

Best regards,

Benjamin

Re: Index Sorting for Japanese

Posted: Wed Jun 22, 2016 11:26 am
by radu_pisoi
B-E-N wrote:Does the webhelp output also sort the terms correctly? The only comment I could find regarding this topic was in a blog, where the author said "no".
In the current oXygen version, the sorting algorithm for index terms does not respect the information specified in the 'index-sort-as' element.

We have already registered this issue and a fix will be available in a future oXygen version. If you don't want to wait until a new oXygen version will be released, you can contact us on support@oxygenxml.com to provide you a patch for this issue.

Re: Index Sorting for Japanese

Posted: Wed Jun 22, 2016 1:40 pm
by Anonymous1
Glad to hear that a patch already exists. I will get in contact with the support. Thanks Radu.

Re: Index Sorting for Japanese

Posted: Wed Oct 05, 2016 12:24 pm
by Edwin
Hi,
we are using the oXygen editor 17.1 and are running some tests on Japanese index sorting for PDF output. Am i correct in assuming the index terms need to be in Katakana or Hiragana? It seems like terms in Kanji characters are ommitted completely.

If so could we add the indexterm like this:

Code: Select all

<indexterm><index-sort-as xml:lang="ja-JA"/>ケイス (場合)</indexterm>
Where we have the Katakana translation of "Case" indexed and the Kanji translation (as occuring in the text) between brackets?

What would happen if we later on decide to create webhelp, would you be able to search in Kanji using the add-on described on this forum?

Re: Index Sorting for Japanese

Posted: Wed Oct 05, 2016 2:44 pm
by radu_pisoi
Hi,
Edwin wrote:we are using the oXygen editor 17.1 and are running some tests on Japanese index sorting for PDF output. Am i correct in assuming the index terms need to be in Katakana or Hiragana? It seems like terms in Kanji characters are ommitted completely.

If so could we add the indexterm like this:

Code: Select all
<indexterm><index-sort-as xml:lang="ja-JA"/>ケイス (場合)</indexterm>
Where we have the Katakana translation of "Case" indexed and the Kanji translation (as occuring in the text) between brackets?
Yes, you can use the index-sort-as to specify a text sequence that will be used for index terms grouping and sorting.
Edwin wrote:What would happen if we later on decide to create webhelp, would you be able to search in Kanji using the add-on described on this forum?
The search function in WebHelp for Japanese is implemented using the Kuromoji library, a word tokenizer special for Japanese. Please note that this library is not blundled with oXygen, if you want to use it please follow the next procedure:
https://www.oxygenxml.com/doc/versions/ ... bhelp.html

Re: Index Sorting for Japanese

Posted: Thu Oct 13, 2016 9:36 am
by Eddie
Hello,

I'd just like to clarify a few points:

1. Has installing Kuromoji got anything to do with creating the index (as in, based on indexterms in DITA)? When I got the message below, I assumed it was referring to the index.

"[OXYWH001W]: Kuromoji analyzer not available for indexing Japanese pages, fallback to default CJK indexer."

2. In your instructions (https://www.oxygenxml.com/doc/versions/ ... bhelp.html), what do you mean by
For the analyzer to work properly, search terms that are entered into your WebHelp pages must be separated by spaces
?
(Specifically, what is "entered into your WebHelp pages" referring to? In the source files? In the Search field?

3. Is the extra step mentioned in post40057.html?hilit=kuromoji#p40057 still necessary?

Cheers,
Eddie

Re: Index Sorting for Japanese

Posted: Thu Oct 13, 2016 11:48 am
by radu_pisoi
Hi,
Eddie wrote:1. Has installing Kuromoji got anything to do with creating the index (as in, based on indexterms in DITA)? When I got the message below, I assumed it was referring to the index.

"[OXYWH001W]: Kuromoji analyzer not available for indexing Japanese pages, fallback to default CJK indexer."
The Kuromoji library affects only the WebHelp search function, to search for a certain word in the generated Webhelp documentation. This is the search page in our documentation:
http://oxygenxml.com/doc/versions/18.0/ ... efactoring

This library does not affect the creation of the Webhelp Index page. This page is generated based on indexterms elements in DITA. This is the Index page in our documentation:
(http://oxygenxml.com/doc/versions/18.0/ ... Terms.html)
Eddie wrote:2. In your instructions (https://www.oxygenxml.com/doc/versions/ ... bhelp.html), what do you mean by

For the analyzer to work properly, search terms that are entered into your WebHelp pages must be separated by spaces

?
(Specifically, what is "entered into your WebHelp pages" referring to? In the source files? In the Search field?
Yes, it is a bit ambiguous. It is referring to the Search text field. I will register an issue to update our documentation.
radu_pisoi wrote:Is the extra step mentioned in post40057.html?hilit=kuromoji#p40057 still necessary?
Yes, it is still necessary for oXygen 18.0.

Re: Index Sorting for Japanese

Posted: Fri Oct 14, 2016 2:35 am
by Eddie
Thanks very much.
All clear.
Eddie.

Re: Index Sorting for Japanese

Posted: Fri Oct 21, 2016 9:51 am
by radu_pisoi
Hi,

I am happy to announce you that version 18.1 of oXygen WebHelp is now available.

In this version we have improved the grouping and sorting of the DITA index terms by taking into consideration the *index-sort-as* element. If this element is specified, then its content will be used to sort and group the DITA index terms.
https://www.oxygenxml.com/dita/1.3/spec ... rt-as.html

Thank you again for your feedback.