[oXygen-user] Help with XProc - Relax NG validation pipeline

John Madden
Sun Jul 18 16:34:52 CDT 2010


Hi guys,

(First of all, forgive this long missive...)

I'm trying to write and XProc step that:

1. takes as input (a) an XML document and (b) a Relax NG schema (or-- in an alternative design--parameters that are the file system location of (a) and (b))
2. runs the validation using Jing
3. outputs the Jing error messages (if any, wrapped appropriately in an xml container element) to an output port
4. passes through the input XML document unchanged (or, possibly, annotated in some way) to another output port

(note: step 3 is exactly what Jing itself does when invoked via the command line, namely outputs error messages, if any, to system out)

Okay, so you say, "What's your problem? Just use <p:validate-with-relax-ng>."

Aha, but, the output of <p:validate-with-relax-ng> is defined as "a copy of the input, possibly augmented by application of the [RELAX NG DTD Compatibility]." There is no output port defined for validation error messages. If the validation fails (and @assert-valid is true), then validation failure will throw an XProc (e.g. Calabash) error. But nowhere do I see any built-in capability in <p:validate-with-relax-ng" to recover the specific validation error messages produced by the validation engine (in this case Jing).

Okay, fine. So I decide I'll do my own declare-step and I'll use <p:exec> to manually invoke Jing via its command line interface. Here's what I have so far:

<p:declare-step xmlns:c="http://www.w3.org/ns/xproc-step" xmlns:p="http://www.w3.org/ns/xproc">
    <p:input port="source">
        <p:empty/>
    </p:input>
    <p:output port="result">
        <p:pipe port="result" step="validate-with-rng"/>
    </p:output>
    <p:exec args="-jar /path-to-jing/jing.jar schema.rng doc.xml" command="java" cwd="/working-path" name="validate-with-rng" result-is-xml="false" wrap-result-lines="true"/>
</p:declare-step>

Fine, now when my document has validation error, I get nicely wrapped Jing error messages on the output port, that e.g. look something like this:

<c:result>
   <c:line>/working-path/example.xml:54:35: error: value of attribute "foo" is invalid; must be equal to "x", "y" or "z"</c:line>
   <c:line>/working-path/example.xml:207:16: error: element "bar" incomplete; missing required element "something"</c:line>
</c:result

But now I've got a problem. I want to make this whole validation step generic, so instead of hard coding the value of //p:exec/@args, I'd like to pass in the locations of (i) Jing, (ii) of the schema, and (iii) of the input document as parameters (or maybe as little xml document rather than as p:parameters). Then I'd like to compute the jing command line.

Uh-oh. I've tried all kinds of ways to get this to work, but there are problems. For one thing the value of //p:exec/@args seems to be untypedAtomic. When I try to construct a string and put it in as a value of //p:exec/@args, Calabash complains about the type being inappropriate. What about giving Jing the schema and document as inputs via a pipe? That doesn't seem to work -- Jing wants its paramaters on the command line.

So, can somebody help me out of this thicket with some suggestions?

Thanks, John

P.S. The official specification of <p:validate-with-schematron> strikes me as much better than that of <p:validate-with-relax-ng>. 

The signature of <p:validate-with-schematron is:

<p:declare-step type="p:validate-with-schematron">
     <p:input port="parameters" kind="parameter"/>
     <p:input port="source" primary="true"/>
     <p:input port="schema"/>
     <p:output port="result" primary="true"/>
     <p:output port="report" sequence="true"/>
     <p:option name="phase" select="'#ALL'"/>                      <!-- string -->
     <p:option name="assert-valid" select="'true'"/>               <!-- boolean -->
</p:declare-step>

So <p:validate-with-schematron> has a standard output port named "report" that makes the error messages from the validation engine accessible. 

By contrast, <p:validate-with-relax-ng> lacks any such port in its signature:

<p:declare-step type="p:validate-with-relax-ng">
     <p:input port="source" primary="true"/>
     <p:input port="schema"/>
     <p:output port="result"/>
     <p:option name="dtd-attribute-values" select="'false'"/>      <!-- boolean -->
     <p:option name="dtd-id-idref-warnings" select="'false'"/>     <!-- boolean -->
     <p:option name="assert-valid" select="'true'"/>               <!-- boolean -->
</p:declare-step>

I don't like that! Does anyone else but me dislike it too?

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2879 bytes
Desc: not available
Url : http://www.oxygenxml.com/pipermail/oxygen-user/attachments/20100718/b62bd476/attachment.bin 


More information about the oXygen-user mailing list