Problem with whitespaces when parsing with Xerces

Guest · Post by **Guest** » Tue Dec 14, 2004 3:05 pm

Hi,

I'm using Xerces DOM Parser as below:

import org.apache.xerces.parsers.DOMParser;

DOMParser parser = new DOMParser();
parser.setFeature("http://apache.org/xml/features/validation/schema",true);
parser.setFeature("http://xml.org/sax/features/validation", true);
parser.setErrorHandler(new errHandler());
parser.parse(xmlFile);

The xml file being parsed looks like this:

<CCphysical xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="MySchema.xsd">
<root name="Name">
<directory name="doc" />
<directory name="doc2" />
<directory name="XML" />
</root>
<CCphysical/>

Extract from the schema:

<xsd:element name="root">
<xsd:complexType>
<xsd:sequence>
<xsd:element maxOccurs="unbounded" minOccurs="0" ref="directory" />
</xsd:sequence>
<xsd:attribute name="name" type="xsd:String" use="optional" />
</xsd:complexType>
</xsd:element>

When parsing the file I don't get 3 childNodes for root but 6 because the parser also counts the whitespaces. Even when setting the feature
parser.setFeature("http://apache.org/xml/features/dom/incl ... ace",false);
it does not work. What's wrong and what do I have to do so that whitespeces are ignored?

Thanks for your help,
Fabian

Post by **Radu** » Tue Dec 14, 2004 6:55 pm

Yes, indeed the root node has as children besides the directory nodes the line break text nodes.
The safe thing to do (and that would work for any xml file no matter how many white spaces are between tags) would be that when iterating through the root's children to add this condition for each node:

Code: Select all

n.getNodeType() == Node.ELEMENT_NODE

and only take into consideration the element nodes.

Problem with whitespaces when parsing with Xerces

Problem with whitespaces when parsing with Xerces

Workaround