[oXygen-user] Xerces command line parsing?

George Cristian Bina
Tue Oct 17 10:19:07 CDT 2006


Dear Andrew,

You guessed correctly, specifying the parser class when you perform the 
transformation makes the XSLT engine use that parser but this does not 
turn on validation, thus you only get a wellformed check.

Xerces does not have a command line utility. However, the Xerces samples 
contain a number of example classes that can be invoked from command 
line. See
http://xerces.apache.org/xerces2-j/samples.html
For instance you can use the sax.Counter sample:
http://xerces.apache.org/xerces2-j/samples-sax.html#Counter

Note that you need to download a Xerces distribution to get also the 
samples jar that needs to be in the classpath together with the 
xercesImpl.jar and xml-apis.jar.

A caveat here is that you cannot enable the catalog support from the 
available command line options.

Best Regards,
George
---------------------------------------------------------------------
George Cristian Bina
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com


Andrew Rouner wrote:
> Hello,
> 
> I am looking for the right syntax and method to be able to batch-parse XML
> files from the command line using Xerces.  I need to use Xerces as I am
> attempting to replicate parsing using oXygen (which has Xerces as its
> default parser).  If anyone can send along the syntax for doing this or can
> point me to a resource that can help, I'd very much appreciate it.
> 
> I previously used xmllint/LIBXML to do command line parsing of my TEI files,
> which worked well for files calling on the TEI xlite DTD.  I am now dealing
> with files that use the full TEI and must rely on the xml catalog, i.e.:
> 
> <?xml version="1.0" encoding="UTF-8"?>
> <!DOCTYPE TEI.2 PUBLIC "-//TEI P4//DTD Main Document Type//EN" "tei2.dtd" [
> <!ENTITY % TEI.XML 'INCLUDE'>
> <!ENTITY % TEI.mixed 'INCLUDE'>
> <!ENTITY % TEI.drama 'INCLUDE'>
> <!ENTITY % TEI.corpus 'INCLUDE'>
> <!ENTITY % TEI.prose 'INCLUDE'>
> <!ENTITY % TEI.figures 'INCLUDE'>
> <!ENTITY % TEI.linking 'INCLUDE'>
> <!ENTITY % TEI.transcr 'INCLUDE'>
> <!ENTITY % TEI.names.dates 'INCLUDE'>
> <!ENTITY % TEI.spoken 'INCLUDE'>
> <!ENTITY % TEI.header 'INCLUDE'>
> <!ENTITY % ISOlat1 SYSTEM
> 'http://www.tei-c.org/Entity_Sets/Unicode/iso-lat1.ent'> %ISOlat1;
> <!ENTITY % ISOlat2 SYSTEM
> 'http://www.tei-c.org/Entity_Sets/Unicode/iso-lat2.ent'> %ISOlat2;
> <!ENTITY % ISOnum SYSTEM
> 'http://www.tei-c.org/Entity_Sets/Unicode/iso-num.ent'> %ISOnum;
> <!ENTITY % ISOpub SYSTEM
> 'http://www.tei-c.org/Entity_Sets/Unicode/iso-pub.ent'> %ISOpub;
> ]>
> 
> I need to use Xerces, because I find that the default parser in oXygen
> (which is Xerces) can successfully parse these files (and LIBXML does not
> work for files using the full TEI due to problems with the DTD).
> 
> My best understanding (which may be completely off) is that to use Xerces as
> an XML parser in the command line, what I am essentially doing, is using the
> syntax to run an XML file through an XSL stylesheet (on the assumption that
> the source file has to validate to run successfully.
> 
> I have modified a previous stylesheet that processes all TEI elements found
> in these documents, and I use this syntax:
> 
> java com.icl.saxon.StyleSheet -x org.apache.xerces.parsers.SAXParser
> source_file.xml stylesheet.xsl > /dev/null
> 
> I am using Xerces as it comes with oXygen (and have not downloaded it
> separately).  Since I am only really interested in parsing and not the
> output, I pipe it to /dev/null.  I have the following in my bash profile for
> the PATH:
> 
> CLASSPATH=$CLASSPATH:/Applications/oxygen/lib/saxon.jar:\
> /Applications/oxygen/frameworks/docbook/xsl/extensions/saxon653.jar.ext:/App
> lications/oxygen/lib/xercesImpl.jar
> export CLASSPATH
> 
> The above command WORKS, and will pick up SOME errors, but is clearly
> missing others.  Does anyone have any more straightforward syntax for just
> PARSING with Xerces, or have any ideas why some errors (I have tested) are
> not being reported through this process?  (One possibility is that it's just
> checking well-formedness, not validity, which I need to test further.)
> 
> Thanks in advance for any help/suggestions.
> 
> Andrew
> 
> Andrew Rouner
> Digital Library Services
> Washington University Libraries
> St. Louis, MO
> 
> EMAIL:  
> 
> 
> 
>> From: Oxygen XML Editor support <>
>> Date: Tue, 25 Jul 2006 12:47:23 +0300
>> To: Andrew Rouner <>
>> Subject: Re: Differences in validators/ dtd problems?
>>
>> Dear Andrew Rouner,
>>
>> Thank you for contacting us.
>> The default parser used by oXygen is Xerces 2.8.0 (that is the latest
>> Xerces version). This looks at a first glance like a problem/bug in XMLLINT.
>> If you want to invoke Xerces to parse a document from command line then
>> you can do that though one of its sample applications:
>> http://xerces.apache.org/xerces2-j/samples.html
>>
>> Best Regards,
>> George
>> ---------------------------------------------------------------------
>> George Cristian Bina
>> <oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
>> http://www.oxygenxml.com
>>
>>
>>
> 
> _______________________________________________
> oXygen-user mailing list
> 
> http://www.oxygenxml.com/mailman/listinfo/oxygen-user



More information about the oXygen-user mailing list