Page 1 of 1

Problem with whitespaces when parsing with Xerces

Posted: Tue Dec 14, 2004 3:05 pm
by Guest
Hi,

I'm using Xerces DOM Parser as below:

import org.apache.xerces.parsers.DOMParser;

DOMParser parser = new DOMParser();
parser.setFeature("http://apache.org/xml/features/validation/schema",true);
parser.setFeature("http://xml.org/sax/features/validation", true);
parser.setErrorHandler(new errHandler());
parser.parse(xmlFile);

The xml file being parsed looks like this:

<CCphysical xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="MySchema.xsd">
<root name="Name">
<directory name="doc" />
<directory name="doc2" />
<directory name="XML" />
</root>
<CCphysical/>

Extract from the schema:

<xsd:element name="root">
<xsd:complexType>
<xsd:sequence>
<xsd:element maxOccurs="unbounded" minOccurs="0" ref="directory" />
</xsd:sequence>
<xsd:attribute name="name" type="xsd:String" use="optional" />
</xsd:complexType>
</xsd:element>

When parsing the file I don't get 3 childNodes for root but 6 because the parser also counts the whitespaces. Even when setting the feature
parser.setFeature("http://apache.org/xml/features/dom/incl ... ace",false);
it does not work. What's wrong and what do I have to do so that whitespeces are ignored?

Thanks for your help,
Fabian

Workaround

Posted: Tue Dec 14, 2004 6:55 pm
by Radu
Yes, indeed the root node has as children besides the directory nodes the line break text nodes.
The safe thing to do (and that would work for any xml file no matter how many white spaces are between tags) would be that when iterating through the root's children to add this condition for each node:

Code: Select all

n.getNodeType() == Node.ELEMENT_NODE
and only take into consideration the element nodes.