Index Sorting for Japanese
Are you missing a feature? Request its implementation here.
Index Sorting for Japanese
Post by Anonymous1 »
Hello,
at the moment, we are publishing to PDF and HTML help. For the future, we'd also like to provide a webhelp for our documentation.
For Japanese, the index and glossary terms must be sorted according to their language rules (Hiragana, Katakana). We add a <sort-as> element to each index term and glossary term, and the build process automatically produces a correct output. So this already works fine for the PDF output via Apache FOP.
Does the webhelp output also sort the terms correctly? The only comment I could find regarding this topic was in a blog, where the author said "no".
In the documentation, the only Japanese parameter for webhelp was this one: webhelp.search.japanese.dictionary
Best regards,
Benjamin
at the moment, we are publishing to PDF and HTML help. For the future, we'd also like to provide a webhelp for our documentation.
For Japanese, the index and glossary terms must be sorted according to their language rules (Hiragana, Katakana). We add a <sort-as> element to each index term and glossary term, and the build process automatically produces a correct output. So this already works fine for the PDF output via Apache FOP.
Does the webhelp output also sort the terms correctly? The only comment I could find regarding this topic was in a blog, where the author said "no".
In the documentation, the only Japanese parameter for webhelp was this one: webhelp.search.japanese.dictionary
Best regards,
Benjamin
-
- Posts: 404
- Joined: Thu Aug 21, 2003 11:36 am
- Location: Craiova
- Contact:
Re: Index Sorting for Japanese
Post by radu_pisoi »
In the current oXygen version, the sorting algorithm for index terms does not respect the information specified in the 'index-sort-as' element.B-E-N wrote:Does the webhelp output also sort the terms correctly? The only comment I could find regarding this topic was in a blog, where the author said "no".
We have already registered this issue and a fix will be available in a future oXygen version. If you don't want to wait until a new oXygen version will be released, you can contact us on support@oxygenxml.com to provide you a patch for this issue.
Radu Pisoi
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com
Re: Index Sorting for Japanese
Post by Anonymous1 »
Glad to hear that a patch already exists. I will get in contact with the support. Thanks Radu.
-
- Posts: 22
- Joined: Tue May 17, 2016 4:58 pm
Re: Index Sorting for Japanese
Hi,
we are using the oXygen editor 17.1 and are running some tests on Japanese index sorting for PDF output. Am i correct in assuming the index terms need to be in Katakana or Hiragana? It seems like terms in Kanji characters are ommitted completely.
If so could we add the indexterm like this: Where we have the Katakana translation of "Case" indexed and the Kanji translation (as occuring in the text) between brackets?
What would happen if we later on decide to create webhelp, would you be able to search in Kanji using the add-on described on this forum?
we are using the oXygen editor 17.1 and are running some tests on Japanese index sorting for PDF output. Am i correct in assuming the index terms need to be in Katakana or Hiragana? It seems like terms in Kanji characters are ommitted completely.
If so could we add the indexterm like this:
Code: Select all
<indexterm><index-sort-as xml:lang="ja-JA"/>ケイス (場合)</indexterm>
What would happen if we later on decide to create webhelp, would you be able to search in Kanji using the add-on described on this forum?
-
- Posts: 404
- Joined: Thu Aug 21, 2003 11:36 am
- Location: Craiova
- Contact:
Re: Index Sorting for Japanese
Post by radu_pisoi »
Hi,
https://www.oxygenxml.com/doc/versions/ ... bhelp.html
Yes, you can use the index-sort-as to specify a text sequence that will be used for index terms grouping and sorting.Edwin wrote:we are using the oXygen editor 17.1 and are running some tests on Japanese index sorting for PDF output. Am i correct in assuming the index terms need to be in Katakana or Hiragana? It seems like terms in Kanji characters are ommitted completely.
If so could we add the indexterm like this:
Code: Select all
<indexterm><index-sort-as xml:lang="ja-JA"/>ケイス (場合)</indexterm>
Where we have the Katakana translation of "Case" indexed and the Kanji translation (as occuring in the text) between brackets?
The search function in WebHelp for Japanese is implemented using the Kuromoji library, a word tokenizer special for Japanese. Please note that this library is not blundled with oXygen, if you want to use it please follow the next procedure:Edwin wrote:What would happen if we later on decide to create webhelp, would you be able to search in Kanji using the add-on described on this forum?
https://www.oxygenxml.com/doc/versions/ ... bhelp.html
Radu Pisoi
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com
-
- Posts: 106
- Joined: Wed Dec 18, 2013 3:07 am
Re: Index Sorting for Japanese
Hello,
I'd just like to clarify a few points:
1. Has installing Kuromoji got anything to do with creating the index (as in, based on indexterms in DITA)? When I got the message below, I assumed it was referring to the index.
"[OXYWH001W]: Kuromoji analyzer not available for indexing Japanese pages, fallback to default CJK indexer."
2. In your instructions (https://www.oxygenxml.com/doc/versions/ ... bhelp.html), what do you mean by
(Specifically, what is "entered into your WebHelp pages" referring to? In the source files? In the Search field?
3. Is the extra step mentioned in post40057.html?hilit=kuromoji#p40057 still necessary?
Cheers,
Eddie
I'd just like to clarify a few points:
1. Has installing Kuromoji got anything to do with creating the index (as in, based on indexterms in DITA)? When I got the message below, I assumed it was referring to the index.
"[OXYWH001W]: Kuromoji analyzer not available for indexing Japanese pages, fallback to default CJK indexer."
2. In your instructions (https://www.oxygenxml.com/doc/versions/ ... bhelp.html), what do you mean by
?For the analyzer to work properly, search terms that are entered into your WebHelp pages must be separated by spaces
(Specifically, what is "entered into your WebHelp pages" referring to? In the source files? In the Search field?
3. Is the extra step mentioned in post40057.html?hilit=kuromoji#p40057 still necessary?
Cheers,
Eddie
-
- Posts: 404
- Joined: Thu Aug 21, 2003 11:36 am
- Location: Craiova
- Contact:
Re: Index Sorting for Japanese
Post by radu_pisoi »
Hi,
http://oxygenxml.com/doc/versions/18.0/ ... efactoring
This library does not affect the creation of the Webhelp Index page. This page is generated based on indexterms elements in DITA. This is the Index page in our documentation:
(http://oxygenxml.com/doc/versions/18.0/ ... Terms.html)
The Kuromoji library affects only the WebHelp search function, to search for a certain word in the generated Webhelp documentation. This is the search page in our documentation:Eddie wrote:1. Has installing Kuromoji got anything to do with creating the index (as in, based on indexterms in DITA)? When I got the message below, I assumed it was referring to the index.
"[OXYWH001W]: Kuromoji analyzer not available for indexing Japanese pages, fallback to default CJK indexer."
http://oxygenxml.com/doc/versions/18.0/ ... efactoring
This library does not affect the creation of the Webhelp Index page. This page is generated based on indexterms elements in DITA. This is the Index page in our documentation:
(http://oxygenxml.com/doc/versions/18.0/ ... Terms.html)
Yes, it is a bit ambiguous. It is referring to the Search text field. I will register an issue to update our documentation.Eddie wrote:2. In your instructions (https://www.oxygenxml.com/doc/versions/ ... bhelp.html), what do you mean by
For the analyzer to work properly, search terms that are entered into your WebHelp pages must be separated by spaces
?
(Specifically, what is "entered into your WebHelp pages" referring to? In the source files? In the Search field?
Yes, it is still necessary for oXygen 18.0.radu_pisoi wrote:Is the extra step mentioned in post40057.html?hilit=kuromoji#p40057 still necessary?
Radu Pisoi
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com
-
- Posts: 404
- Joined: Thu Aug 21, 2003 11:36 am
- Location: Craiova
- Contact:
Re: Index Sorting for Japanese
Post by radu_pisoi »
Hi,
I am happy to announce you that version 18.1 of oXygen WebHelp is now available.
In this version we have improved the grouping and sorting of the DITA index terms by taking into consideration the *index-sort-as* element. If this element is specified, then its content will be used to sort and group the DITA index terms.
https://www.oxygenxml.com/dita/1.3/spec ... rt-as.html
Thank you again for your feedback.
I am happy to announce you that version 18.1 of oXygen WebHelp is now available.
In this version we have improved the grouping and sorting of the DITA index terms by taking into consideration the *index-sort-as* element. If this element is specified, then its content will be used to sort and group the DITA index terms.
https://www.oxygenxml.com/dita/1.3/spec ... rt-as.html
Thank you again for your feedback.
Radu Pisoi
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com
Jump to
- Oxygen XML Editor/Author/Developer
- ↳ Feature Request
- ↳ Common Problems
- ↳ DITA (Editing and Publishing DITA Content)
- ↳ SDK-API, Frameworks - Document Types
- ↳ DocBook
- ↳ TEI
- ↳ XHTML
- ↳ Other Issues
- Oxygen XML Web Author
- ↳ Feature Request
- ↳ Common Problems
- Oxygen Content Fusion
- ↳ Feature Request
- ↳ Common Problems
- Oxygen JSON Editor
- ↳ Feature Request
- ↳ Common Problems
- Oxygen PDF Chemistry
- ↳ Feature Request
- ↳ Common Problems
- Oxygen Feedback
- ↳ Feature Request
- ↳ Common Problems
- Oxygen XML WebHelp
- ↳ Feature Request
- ↳ Common Problems
- XML
- ↳ General XML Questions
- ↳ XSLT and FOP
- ↳ XML Schemas
- ↳ XQuery
- NVDL
- ↳ General NVDL Issues
- ↳ oNVDL Related Issues
- XML Services Market
- ↳ Offer a Service