Page 1 of 1

Implementing and extending autocorrect functionality

Posted: Mon Jun 01, 2015 4:04 pm
by ScandWe
I'm trying to implement an autocorrect feature in Oxygen 16.1 because an upgrade to 17 is not possible at this time. To facilitate parts of it, I'm looking for ways to reuse existing spellchecking (16.1) or autocorrect (17) mechanisms but I didn't find any interfaces in the API to intervene before or after them.

The intended functionality is as follows:
* correct words normally but enforce certain casing (i.e. change 'someword' to 'Some WORD')
* surround word with XML if not present (i.e. change 'someword' to '<tag>Some WORD</tag>' but only change '<tag>someword</tag>' to '<tag>Some WORD</tag>' without duplicating XML)
* offer option to autocorrect a whole document in the same way

Currently I'm extending Oxygen 16.1 Eclipse Plugin by using the AuthorDocumentFilter to autocorrect the words manually and then checking the parent node to determine when to add XML. The operation for the whole document does much of the same but uses a TextContentIterator to walk over everything.

My 3 questions:

1. Would it be possible to somehow use the integrated spellchecking functions of Oxygen 16.1 to load a custom dictionary and handle the words or do I need to add another library like Lucene to do that myself?

2. When an update to Oxygen 17 is possible some time in the future, could the extension be simplified to reuse some of the normal Oxygen autocomplete functionality?

3. Adding a surrounding XML fragment already takes about a second so the operation for the whole document is quite slow if a lot of fragments need to be added. Any suggestions on how to improve this? Maybe a batch insert?

Re: Implementing and extending autocorrect functionality

Posted: Tue Jun 02, 2015 2:25 pm
by alex_jitianu
Hello,

As far as I can tell you have started on the right path by using an AuthorDocumentFilter.

1. For spellcheck we are using Hunspell but unfortunately there is no API to allow you to use it. You could use a library like Lucene.
2. I'll add an issue to add some API on the auto-correct side. Maybe some events when the support is being triggered as well as when it has performed a replacement.
3. I suspect that the biggest part of the time is spent in computing the layout changes. If you want to process multiple fragments I suggest using these two API methods:

ro.sync.ecss.extensions.api.AuthorDocumentController.insertMultipleFragments(AuthorElement, AuthorDocumentFragment[], int[])
ro.sync.ecss.extensions.api.AuthorDocumentController.multipleDelete(AuthorElement, int[], int[])

By using these methods you will benefit from a single layout event after the operation is finished.

We also like your idea of replacing words with XML fragments so I will add an issue to have this support built-in.

Best regards,
Alex

Re: Implementing and extending autocorrect functionality

Posted: Wed Jun 03, 2015 12:06 pm
by ScandWe
Thanks for your suggestions. Turns out the multiple calls to DocumentController.surroundWithFragment() made the autocorrect so slow. It is much faster with the insertMultipleFragments and multipleDelete. However, these methods resulted in some other problems.

It is no longer possible to use undo to revert the autocorrections. I used compoundEdit to wrap all delete, insertText and surroundWithFragment calls into one undo which doesn't seem to work with the multi inserts anymore. Can I somehow restore the undo functionality?

Is there a simple way to insert text in an AuthorDocumentFragment? Since I can only insert multiple fragments instead of surrounding some text with it, I have to fill the fragment with the desired text before inserting it into the document. Is there a simple way to insert text at the right position between the marker characters? So far, I'm just putting it in the center which obviously only works on a symmetrical fragment. Oxygen itself usually inserts text into the first leaf node. How do I get that position? The getContentNodes() method of the fragment only gets the fragment root with no way to reach its children.

My current solution for symmetrical fragments (e.g. <b><tm></tm></b>):

Code: Select all


AuthorDocumentFragment fragment = authorAccess.getDocumentController().createNewDocumentFragmentInContext(surroundingXml, currentWordStartPosition);
fragment.getContent().insertChars(fragment.getLength()/2, word.toCharArray(), 0, word.toCharArray().length);
fragmentsToInsert.put(currentWordStartPosition, fragment);

Re: Implementing and extending autocorrect functionality

Posted: Wed Jun 03, 2015 12:40 pm
by alex_jitianu
Hello,

1.As long as you surround all the code like this, you should have just one UNDO... This code executes from an AuthorDocumentFilter event, right?

Code: Select all

AuthorDocumentController ctrl = ...;
ctrl.beginCompoundEdit();
try {
// My code
} finally {
ctrl.endCompoundEdit();
}
2. From the snippet I see a surroundingXml variable that seems to be bound to <b><tm></tm></b>. You could have a marker like this:

Code: Select all

String surroundingXml = "<b><tm>{marker}</tm></b>";
String toInsert = surroundingXml.replace("{marker}", word);
AuthorDocumentFragment fragment = authorAccess.getDocumentController().createNewDocumentFragmentInContext(
toInsert, currentWordStartPosition);
fragmentsToInsert.put(currentWordStartPosition, fragment);
If for some reason you can't do that, AuthorDocumentFragment.getContentNodes() returns a List<AuthorNode>. Those AuthorNode(s) might be AuthorParentNode(s) which have a getContentNodes() method too. By iterating over the node hierarchy you can reach a leaf and use fragment.getContent().insertChars() at its offsets.

Best regards,
Alex