Getting 'raw' xml of a node
Having trouble installing Oxygen? Got a bug to report? Post it all here.
-
- Posts: 83
- Joined: Wed May 20, 2009 1:18 pm
Getting 'raw' xml of a node
Hi there,
I need to obtain, for either a AuthorNode, AuhtorElement or (preferably) AuthorDocumentFragment instance, the 'raw' XML of the object; i.e. the XML as it would appear in text mode. There doesn't appear to be a method for this; is there one?
If not, the solution I have is to reconstitute the XML by iterating through the node, and its content nodes (as obtained from getContentNodes()), and, for AuthorElement nodes, using the getName(), getNamepsace(), and getAttribute() methods, and for NODE_TYPE_TEXT nodes using the getContent() method to build a String representation of the XML. Obviously the XML so generated is more likely to be incorrect when compared to grabbing the XML directly from the Oxygen API, but if this is not posisble, does the above method sound sensible?
Thanks,
Simon.
I need to obtain, for either a AuthorNode, AuhtorElement or (preferably) AuthorDocumentFragment instance, the 'raw' XML of the object; i.e. the XML as it would appear in text mode. There doesn't appear to be a method for this; is there one?
If not, the solution I have is to reconstitute the XML by iterating through the node, and its content nodes (as obtained from getContentNodes()), and, for AuthorElement nodes, using the getName(), getNamepsace(), and getAttribute() methods, and for NODE_TYPE_TEXT nodes using the getContent() method to build a String representation of the XML. Obviously the XML so generated is more likely to be incorrect when compared to grabbing the XML directly from the Oxygen API, but if this is not posisble, does the above method sound sensible?
Thanks,
Simon.
-
- Posts: 9438
- Joined: Fri Jul 09, 2004 5:18 pm
Re: Getting 'raw' xml of a node
Hello Simon,
In the AuthorDocumentController class there are some useful methods:
So you can create a document fragment from a node (with copyContent true) and then serialize the document fragment.
The first method exists from the beginning of the Author API and the second was added before the 10.3 release so you should be able to use it too.
The alternative you suggested is also achievable but of course, not very easy to accomplish correctly.
Regards,
Radu
In the AuthorDocumentController class there are some useful methods:
Code: Select all
AuthorDocumentFragment createDocumentFragment(AuthorNode node, boolean copyContent) throws BadLocationException;
Code: Select all
String serializeFragmentToXML(AuthorDocumentFragment fragment) throws BadLocationException;
The first method exists from the beginning of the Author API and the second was added before the 10.3 release so you should be able to use it too.
The alternative you suggested is also achievable but of course, not very easy to accomplish correctly.
Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
<oXygen/> XML Editor
http://www.oxygenxml.com
-
- Posts: 83
- Joined: Wed May 20, 2009 1:18 pm
Re: Getting 'raw' xml of a node
Hi, still got a slight problmns here.
The XML returned from the serializeFragmentToXML() method isn't quite the same as that which appears in the text mode; specifically it doesn't have the same inter tag whitespace, ie. it's not tab indented in the same way. I assume this is because this method is only concerned with generating a logically equivalent string and not a visually eqivalent string, or, that the text mode generates inter tag whitespace for display purposes which doesn't actually exist in the source XML.
Either way I will need to adjust the XML returned from this method to match that seen in the text mode. Is there a existing mechanism for applying formatting (inserting whitespace) to reproduce what text mode does or will I have to write my own?
Thanks,
Simon.
The XML returned from the serializeFragmentToXML() method isn't quite the same as that which appears in the text mode; specifically it doesn't have the same inter tag whitespace, ie. it's not tab indented in the same way. I assume this is because this method is only concerned with generating a logically equivalent string and not a visually eqivalent string, or, that the text mode generates inter tag whitespace for display purposes which doesn't actually exist in the source XML.
Either way I will need to adjust the XML returned from this method to match that seen in the text mode. Is there a existing mechanism for applying formatting (inserting whitespace) to reproduce what text mode does or will I have to write my own?
Thanks,
Simon.
-
- Posts: 9438
- Joined: Fri Jul 09, 2004 5:18 pm
Re: Getting 'raw' xml of a node
Hi Simon,
The Author keeps the XML in an internal structure with all whitespaces normalized. We perform an additional format-and-indent on save to generate pretty XML.
So indeed the serialized fragment reflects more precisely the internal content and is not indented. The XML is equivalent to an indented one.
You can either write your own code to format and indent it or maybe use a class from Xerces which does that: XMLSerializer which can format and indent DOM nodes.
So you can probably wrap the XML fragment in a <root> element to make it well formed, make a DOM Document from it and serialize it with the XMLSerializer.
Regards,
Radu
The Author keeps the XML in an internal structure with all whitespaces normalized. We perform an additional format-and-indent on save to generate pretty XML.
So indeed the serialized fragment reflects more precisely the internal content and is not indented. The XML is equivalent to an indented one.
You can either write your own code to format and indent it or maybe use a class from Xerces which does that: XMLSerializer which can format and indent DOM nodes.
So you can probably wrap the XML fragment in a <root> element to make it well formed, make a DOM Document from it and serialize it with the XMLSerializer.
Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
<oXygen/> XML Editor
http://www.oxygenxml.com
-
- Posts: 83
- Joined: Wed May 20, 2009 1:18 pm
Re: Getting 'raw' xml of a node
FYI, I did something similar to what you suggested, but used the javax.xml.transform.Transformer class, with the OutputKeys.INDENT property set. The code is as follows:
Code: Select all
/**
* This class reformats the supplied XML (which must be valid) by
* the insertion of additional indentation whitespace to make the
* XML more readable. The {@link Transformer} class is used to
* perform the actual indentation, this is enabled using the
* {@link OutputKeys}.INDENT flag.
*
* @param rawXML The XML that should be made human readable
* @return rawXML with the addition of indentation whitespace
* @throws TransformerException
*/
private String prettify(String rawXML) throws TransformerException{
//to prettify we simple get the Java XML transform framework to
//do a null transform and set the indent flag to true
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
//create the input and output objects for the transform
StreamResult result = new StreamResult(new StringWriter());
StreamSource source = new StreamSource(new StringReader(rawXML));
//do the null transform, the only change will be addition of a '<?xml version="1.0" encoding="utf-8"?>' and the formatting
transformer.transform(source, result);
String prettyXML = result.getWriter().toString();
//we don't really want the '<?xml version="1.0" encoding="utf-8"?>' bit at the start so we remove it
//this is as simple as dropping the first line (and the newline character)
prettyXML = prettyXML.substring(prettyXML.indexOf("\n")+1);
return prettyXML;
}
-
- Posts: 83
- Joined: Wed May 20, 2009 1:18 pm
Re: Getting 'raw' xml of a node
Just spotted the OMIT_XML_DECLARATION property, that is much better than my solution, so the code now look like:
Code: Select all
/**
* This class reformats the supplied XML (which must be valid) by
* the insertion of additional indentation whitespace to make the
* XML more readable. The {@link Transformer} class is used to
* perform the actual indentation, this is enabled using the
* {@link OutputKeys}.INDENT flag.
*
* @param rawXML The XML that should be made human readable
* @return rawXML with the addition of indentation whitespace
* @throws TransformerException
*/
private String prettify(String rawXML) throws TransformerException{
//to prettify we simple get the Java XML transform framework to
//do a null transform and set the indent flag to true
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
//create the input and output objects for the transform
StreamResult result = new StreamResult(new StringWriter());
StreamSource source = new StreamSource(new StringReader(rawXML));
//do the null transform, the only change will be addition of the formatting
transformer.transform(source, result);
String prettyXML = result.getWriter().toString();
return prettyXML;
}
Jump to
- Oxygen XML Editor/Author/Developer
- ↳ Feature Request
- ↳ Common Problems
- ↳ DITA (Editing and Publishing DITA Content)
- ↳ SDK-API, Frameworks - Document Types
- ↳ DocBook
- ↳ TEI
- ↳ XHTML
- ↳ Other Issues
- Oxygen XML Web Author
- ↳ Feature Request
- ↳ Common Problems
- Oxygen Content Fusion
- ↳ Feature Request
- ↳ Common Problems
- Oxygen JSON Editor
- ↳ Feature Request
- ↳ Common Problems
- Oxygen PDF Chemistry
- ↳ Feature Request
- ↳ Common Problems
- Oxygen Feedback
- ↳ Feature Request
- ↳ Common Problems
- Oxygen XML WebHelp
- ↳ Feature Request
- ↳ Common Problems
- XML
- ↳ General XML Questions
- ↳ XSLT and FOP
- ↳ XML Schemas
- ↳ XQuery
- NVDL
- ↳ General NVDL Issues
- ↳ oNVDL Related Issues
- XML Services Market
- ↳ Offer a Service