Index Sorting for Japanese

Are you missing a feature? Request its implementation here.
Anonymous1

Index Sorting for Japanese

Post by Anonymous1 »

Hello,

at the moment, we are publishing to PDF and HTML help. For the future, we'd also like to provide a webhelp for our documentation.

For Japanese, the index and glossary terms must be sorted according to their language rules (Hiragana, Katakana). We add a <sort-as> element to each index term and glossary term, and the build process automatically produces a correct output. So this already works fine for the PDF output via Apache FOP.

Does the webhelp output also sort the terms correctly? The only comment I could find regarding this topic was in a blog, where the author said "no".

In the documentation, the only Japanese parameter for webhelp was this one: webhelp.search.japanese.dictionary

Best regards,

Benjamin
radu_pisoi
Posts: 403
Joined: Thu Aug 21, 2003 11:36 am
Location: Craiova
Contact:

Re: Index Sorting for Japanese

Post by radu_pisoi »

B-E-N wrote:Does the webhelp output also sort the terms correctly? The only comment I could find regarding this topic was in a blog, where the author said "no".
In the current oXygen version, the sorting algorithm for index terms does not respect the information specified in the 'index-sort-as' element.

We have already registered this issue and a fix will be available in a future oXygen version. If you don't want to wait until a new oXygen version will be released, you can contact us on support@oxygenxml.com to provide you a patch for this issue.
Radu Pisoi
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com
Anonymous1

Re: Index Sorting for Japanese

Post by Anonymous1 »

Glad to hear that a patch already exists. I will get in contact with the support. Thanks Radu.
Edwin
Posts: 22
Joined: Tue May 17, 2016 4:58 pm

Re: Index Sorting for Japanese

Post by Edwin »

Hi,
we are using the oXygen editor 17.1 and are running some tests on Japanese index sorting for PDF output. Am i correct in assuming the index terms need to be in Katakana or Hiragana? It seems like terms in Kanji characters are ommitted completely.

If so could we add the indexterm like this:

Code: Select all

<indexterm><index-sort-as xml:lang="ja-JA"/>ケイス (場合)</indexterm>
Where we have the Katakana translation of "Case" indexed and the Kanji translation (as occuring in the text) between brackets?

What would happen if we later on decide to create webhelp, would you be able to search in Kanji using the add-on described on this forum?
radu_pisoi
Posts: 403
Joined: Thu Aug 21, 2003 11:36 am
Location: Craiova
Contact:

Re: Index Sorting for Japanese

Post by radu_pisoi »

Hi,
Edwin wrote:we are using the oXygen editor 17.1 and are running some tests on Japanese index sorting for PDF output. Am i correct in assuming the index terms need to be in Katakana or Hiragana? It seems like terms in Kanji characters are ommitted completely.

If so could we add the indexterm like this:

Code: Select all
<indexterm><index-sort-as xml:lang="ja-JA"/>ケイス (場合)</indexterm>
Where we have the Katakana translation of "Case" indexed and the Kanji translation (as occuring in the text) between brackets?
Yes, you can use the index-sort-as to specify a text sequence that will be used for index terms grouping and sorting.
Edwin wrote:What would happen if we later on decide to create webhelp, would you be able to search in Kanji using the add-on described on this forum?
The search function in WebHelp for Japanese is implemented using the Kuromoji library, a word tokenizer special for Japanese. Please note that this library is not blundled with oXygen, if you want to use it please follow the next procedure:
https://www.oxygenxml.com/doc/versions/ ... bhelp.html
Radu Pisoi
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com
Eddie
Posts: 106
Joined: Wed Dec 18, 2013 3:07 am

Re: Index Sorting for Japanese

Post by Eddie »

Hello,

I'd just like to clarify a few points:

1. Has installing Kuromoji got anything to do with creating the index (as in, based on indexterms in DITA)? When I got the message below, I assumed it was referring to the index.

"[OXYWH001W]: Kuromoji analyzer not available for indexing Japanese pages, fallback to default CJK indexer."

2. In your instructions (https://www.oxygenxml.com/doc/versions/ ... bhelp.html), what do you mean by
For the analyzer to work properly, search terms that are entered into your WebHelp pages must be separated by spaces
?
(Specifically, what is "entered into your WebHelp pages" referring to? In the source files? In the Search field?

3. Is the extra step mentioned in post40057.html?hilit=kuromoji#p40057 still necessary?

Cheers,
Eddie
radu_pisoi
Posts: 403
Joined: Thu Aug 21, 2003 11:36 am
Location: Craiova
Contact:

Re: Index Sorting for Japanese

Post by radu_pisoi »

Hi,
Eddie wrote:1. Has installing Kuromoji got anything to do with creating the index (as in, based on indexterms in DITA)? When I got the message below, I assumed it was referring to the index.

"[OXYWH001W]: Kuromoji analyzer not available for indexing Japanese pages, fallback to default CJK indexer."
The Kuromoji library affects only the WebHelp search function, to search for a certain word in the generated Webhelp documentation. This is the search page in our documentation:
http://oxygenxml.com/doc/versions/18.0/ ... efactoring

This library does not affect the creation of the Webhelp Index page. This page is generated based on indexterms elements in DITA. This is the Index page in our documentation:
(http://oxygenxml.com/doc/versions/18.0/ ... Terms.html)
Eddie wrote:2. In your instructions (https://www.oxygenxml.com/doc/versions/ ... bhelp.html), what do you mean by

For the analyzer to work properly, search terms that are entered into your WebHelp pages must be separated by spaces

?
(Specifically, what is "entered into your WebHelp pages" referring to? In the source files? In the Search field?
Yes, it is a bit ambiguous. It is referring to the Search text field. I will register an issue to update our documentation.
radu_pisoi wrote:Is the extra step mentioned in post40057.html?hilit=kuromoji#p40057 still necessary?
Yes, it is still necessary for oXygen 18.0.
Radu Pisoi
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com
Eddie
Posts: 106
Joined: Wed Dec 18, 2013 3:07 am

Re: Index Sorting for Japanese

Post by Eddie »

Thanks very much.
All clear.
Eddie.
radu_pisoi
Posts: 403
Joined: Thu Aug 21, 2003 11:36 am
Location: Craiova
Contact:

Re: Index Sorting for Japanese

Post by radu_pisoi »

Hi,

I am happy to announce you that version 18.1 of oXygen WebHelp is now available.

In this version we have improved the grouping and sorting of the DITA index terms by taking into consideration the *index-sort-as* element. If this element is specified, then its content will be used to sort and group the DITA index terms.
https://www.oxygenxml.com/dita/1.3/spec ... rt-as.html

Thank you again for your feedback.
Radu Pisoi
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com
Post Reply