Getting 'raw' xml of a node

Having trouble installing Oxygen? Got a bug to report? Post it all here.
sijomon
Posts: 83
Joined: Wed May 20, 2009 1:18 pm

Getting 'raw' xml of a node

Post by sijomon »

Hi there,

I need to obtain, for either a AuthorNode, AuhtorElement or (preferably) AuthorDocumentFragment instance, the 'raw' XML of the object; i.e. the XML as it would appear in text mode. There doesn't appear to be a method for this; is there one?

If not, the solution I have is to reconstitute the XML by iterating through the node, and its content nodes (as obtained from getContentNodes()), and, for AuthorElement nodes, using the getName(), getNamepsace(), and getAttribute() methods, and for NODE_TYPE_TEXT nodes using the getContent() method to build a String representation of the XML. Obviously the XML so generated is more likely to be incorrect when compared to grabbing the XML directly from the Oxygen API, but if this is not posisble, does the above method sound sensible?

Thanks,

Simon.
Radu
Posts: 9018
Joined: Fri Jul 09, 2004 5:18 pm

Re: Getting 'raw' xml of a node

Post by Radu »

Hello Simon,

In the AuthorDocumentController class there are some useful methods:

Code: Select all

AuthorDocumentFragment createDocumentFragment(AuthorNode node, boolean copyContent) throws BadLocationException;

Code: Select all

String serializeFragmentToXML(AuthorDocumentFragment fragment) throws BadLocationException;
So you can create a document fragment from a node (with copyContent true) and then serialize the document fragment.
The first method exists from the beginning of the Author API and the second was added before the 10.3 release so you should be able to use it too.

The alternative you suggested is also achievable but of course, not very easy to accomplish correctly.

Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
sijomon
Posts: 83
Joined: Wed May 20, 2009 1:18 pm

Re: Getting 'raw' xml of a node

Post by sijomon »

Excellent, that's made things a lot easier.

Many thanks.
sijomon
Posts: 83
Joined: Wed May 20, 2009 1:18 pm

Re: Getting 'raw' xml of a node

Post by sijomon »

Hi, still got a slight problmns here.

The XML returned from the serializeFragmentToXML() method isn't quite the same as that which appears in the text mode; specifically it doesn't have the same inter tag whitespace, ie. it's not tab indented in the same way. I assume this is because this method is only concerned with generating a logically equivalent string and not a visually eqivalent string, or, that the text mode generates inter tag whitespace for display purposes which doesn't actually exist in the source XML.

Either way I will need to adjust the XML returned from this method to match that seen in the text mode. Is there a existing mechanism for applying formatting (inserting whitespace) to reproduce what text mode does or will I have to write my own?

Thanks,

Simon.
Radu
Posts: 9018
Joined: Fri Jul 09, 2004 5:18 pm

Re: Getting 'raw' xml of a node

Post by Radu »

Hi Simon,

The Author keeps the XML in an internal structure with all whitespaces normalized. We perform an additional format-and-indent on save to generate pretty XML.
So indeed the serialized fragment reflects more precisely the internal content and is not indented. The XML is equivalent to an indented one.
You can either write your own code to format and indent it or maybe use a class from Xerces which does that: XMLSerializer which can format and indent DOM nodes.
So you can probably wrap the XML fragment in a <root> element to make it well formed, make a DOM Document from it and serialize it with the XMLSerializer.

Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
sijomon
Posts: 83
Joined: Wed May 20, 2009 1:18 pm

Re: Getting 'raw' xml of a node

Post by sijomon »

Thanks for the suggestions.
sijomon
Posts: 83
Joined: Wed May 20, 2009 1:18 pm

Re: Getting 'raw' xml of a node

Post by sijomon »

FYI, I did something similar to what you suggested, but used the javax.xml.transform.Transformer class, with the OutputKeys.INDENT property set. The code is as follows:

Code: Select all

/**
* This class reformats the supplied XML (which must be valid) by
* the insertion of additional indentation whitespace to make the
* XML more readable. The {@link Transformer} class is used to
* perform the actual indentation, this is enabled using the
* {@link OutputKeys}.INDENT flag.
*
* @param rawXML The XML that should be made human readable
* @return rawXML with the addition of indentation whitespace
* @throws TransformerException
*/
private String prettify(String rawXML) throws TransformerException{

//to prettify we simple get the Java XML transform framework to
//do a null transform and set the indent flag to true
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");

//create the input and output objects for the transform
StreamResult result = new StreamResult(new StringWriter());
StreamSource source = new StreamSource(new StringReader(rawXML));

//do the null transform, the only change will be addition of a '<?xml version="1.0" encoding="utf-8"?>' and the formatting
transformer.transform(source, result);
String prettyXML = result.getWriter().toString();

//we don't really want the '<?xml version="1.0" encoding="utf-8"?>' bit at the start so we remove it
//this is as simple as dropping the first line (and the newline character)

prettyXML = prettyXML.substring(prettyXML.indexOf("\n")+1);

return prettyXML;
}
Radu
Posts: 9018
Joined: Fri Jul 09, 2004 5:18 pm

Re: Getting 'raw' xml of a node

Post by Radu »

Hi,

Yes, indeed an identity transform is also a good solution.
Thanks for updating the post.

Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
sijomon
Posts: 83
Joined: Wed May 20, 2009 1:18 pm

Re: Getting 'raw' xml of a node

Post by sijomon »

Just spotted the OMIT_XML_DECLARATION property, that is much better than my solution, so the code now look like:

Code: Select all

/**
* This class reformats the supplied XML (which must be valid) by
* the insertion of additional indentation whitespace to make the
* XML more readable. The {@link Transformer} class is used to
* perform the actual indentation, this is enabled using the
* {@link OutputKeys}.INDENT flag.
*
* @param rawXML The XML that should be made human readable
* @return rawXML with the addition of indentation whitespace
* @throws TransformerException
*/
private String prettify(String rawXML) throws TransformerException{

//to prettify we simple get the Java XML transform framework to
//do a null transform and set the indent flag to true
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");

//create the input and output objects for the transform
StreamResult result = new StreamResult(new StringWriter());
StreamSource source = new StreamSource(new StringReader(rawXML));

//do the null transform, the only change will be addition of the formatting
transformer.transform(source, result);
String prettyXML = result.getWriter().toString();

return prettyXML;
}
Post Reply