Problem with whitespaces when parsing with Xerces

This should cover W3C XML Schema, Relax NG and DTD related problems.

Problem with whitespaces when parsing with Xerces

Post by Guest » Tue Dec 14, 2004 3:05 pm


I'm using Xerces DOM Parser as below:

import org.apache.xerces.parsers.DOMParser;

DOMParser parser = new DOMParser();
parser.setFeature("", true);
parser.setErrorHandler(new errHandler());

The xml file being parsed looks like this:

<CCphysical xmlns:xsi="" xsi:noNamespaceSchemaLocation="MySchema.xsd">
<root name="Name">
<directory name="doc" />
<directory name="doc2" />
<directory name="XML" />

Extract from the schema:

<xsd:element name="root">
<xsd:element maxOccurs="unbounded" minOccurs="0" ref="directory" />
<xsd:attribute name="name" type="xsd:String" use="optional" />

When parsing the file I don't get 3 childNodes for root but 6 because the parser also counts the whitespaces. Even when setting the feature
parser.setFeature(" ... ace",false);
it does not work. What's wrong and what do I have to do so that whitespeces are ignored?

Thanks for your help,

Posts: 7594
Joined: Fri Jul 09, 2004 5:18 pm


Post by Radu » Tue Dec 14, 2004 6:55 pm

Yes, indeed the root node has as children besides the directory nodes the line break text nodes.
The safe thing to do (and that would work for any xml file no matter how many white spaces are between tags) would be that when iterating through the root's children to add this condition for each node:

Code: Select all

n.getNodeType() == Node.ELEMENT_NODE
and only take into consideration the element nodes.

Post Reply