Problem with whitespaces when parsing with Xerces

This should cover W3C XML Schema, Relax NG and DTD related problems.
Guest

Problem with whitespaces when parsing with Xerces

Post by Guest »

Hi,

I'm using Xerces DOM Parser as below:

import org.apache.xerces.parsers.DOMParser;

DOMParser parser = new DOMParser();
parser.setFeature("http://apache.org/xml/features/validation/schema",true);
parser.setFeature("http://xml.org/sax/features/validation", true);
parser.setErrorHandler(new errHandler());
parser.parse(xmlFile);

The xml file being parsed looks like this:

<CCphysical xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="MySchema.xsd">
<root name="Name">
<directory name="doc" />
<directory name="doc2" />
<directory name="XML" />
</root>
<CCphysical/>

Extract from the schema:

<xsd:element name="root">
<xsd:complexType>
<xsd:sequence>
<xsd:element maxOccurs="unbounded" minOccurs="0" ref="directory" />
</xsd:sequence>
<xsd:attribute name="name" type="xsd:String" use="optional" />
</xsd:complexType>
</xsd:element>

When parsing the file I don't get 3 childNodes for root but 6 because the parser also counts the whitespaces. Even when setting the feature
parser.setFeature("http://apache.org/xml/features/dom/incl ... ace",false);
it does not work. What's wrong and what do I have to do so that whitespeces are ignored?

Thanks for your help,
Fabian
Radu
Posts: 9059
Joined: Fri Jul 09, 2004 5:18 pm

Workaround

Post by Radu »

Yes, indeed the root node has as children besides the directory nodes the line break text nodes.
The safe thing to do (and that would work for any xml file no matter how many white spaces are between tags) would be that when iterating through the root's children to add this condition for each node:

Code: Select all

n.getNodeType() == Node.ELEMENT_NODE
and only take into consideration the element nodes.
Post Reply