Page 1 of 1

Content index to document index

Posted: Mon Nov 16, 2009 3:14 pm
by sijomon
Hi,

I'm pretty sure I've seen this issue covered, at least partially, in another post I read sometime back, but I can't find it, so sorry if I'm duplicating stuff here.

I and builidng an operation which shows the user a list of all tags of a certain type within the current document, and allows them to highligth the tags by clicking in the list. The list should also contain section of text which look liek they should be marked up with the tag. These section of text have been specified bya regular expression. I am currently extract the content of the doument as follows:

Code: Select all

authorAccess.getDocumentController().getText(0, authorAccess.getDocumentController().getTextContentLength());
I can then run the rexex over this, and get a number of matches, with there start and end indexx into the content string.

What I can't do is convert this index into a index into the author document. As far as I recall the index into the author document is exssentially the same as that into the conetnt, with +1 for each tag. I really don't won't to have to construct this index by parsing the document node tree; is there an automated way to convert from a content index to a n author document index?

Thanks,

Simon.

Re: Content index to document index

Posted: Mon Nov 16, 2009 4:40 pm
by sorin_ristache
Hello,

As you can read in the javadoc of Author API the method AuthorDocumentController.getTextContentLength() is deprecated. You should find the list of elements that have the same tag name for example TAG with AuthorDocumentController.findNodesByXPath("//TAG", true, true, true) that returns an array AuthorNode[]. You get the start index and end index of every AuthorNode using AuthorNode.getStartOffset() and AuthorNode.getEndOffset().


Regards,
Sorin

Re: Content index to document index

Posted: Mon Nov 16, 2009 5:18 pm
by sijomon
I want to search across all text within the document, and don't know what node migth contain the matches I'm interested in. For example, say I am searchign for URLs, I want to find all text in the document that looks like a URL, regardless of where in the document that text occurs. My knowledge of XPath is pretty sketchy, can I use XPath to identify nodes which contain text which matches a certain regex? If so I can use the method you indicate, if not, have you any other suggestions?

Re: Content index to document index

Posted: Mon Nov 16, 2009 5:49 pm
by sijomon
Think I can use XPath.

A bit of research, and it appears the xpath expression:

Code: Select all

//text()[matches(., "<REGEX>")]
Will identify all text node which match the regex. Then I can grab the offset of the text node start, using AuthorNode.getStartOffset(), then I can run the text noide's content through the same regex in java to get the offset of the start of the match, add this to the node offest and I should get the document offset.

I think that will work.

Re: Content index to document index

Posted: Mon Nov 16, 2009 6:20 pm
by sorin_ristache
I am not sure that will work because matches() is an XSLT function not an XPath function. I think you will have to go through all elements or all nodes with //* or //text() and check if the content matches your regex.


Regards,
Sorin

Re: Content index to document index

Posted: Tue Nov 17, 2009 1:07 pm
by sijomon
Hi,

I believe matches is part of XPath 2.0

http://www.w3.org/TR/xpath-functions/#func-matches

Either way, it does work as an XPath expression in Oxygen, and I am now succesfully identiying the start and end of the mathces in the document.

Many Thanks,

Simon.