[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] rendering marginal XML


Subject: Re: [xsl] rendering marginal XML
From: Peter Flynn <peter@xxxxxxxxxxx>
Date: Fri, 02 Nov 2001 15:24:08 +0000

Jay Kline wrote:
> I run a list server that generates its logs in XML format.
> It appears to be valid XML,

Only if you have a DTD for it. Otherwise it's just well-formed

> but it uses a method that is rather clumbersome.

(or "evilly-formed" as a colleague of mine refers to this type
of stuff :-). "Clumbersome" is an excellent description. It
looks as if it was designed by someone who had heard XML
described, but had never seen any before.

It uses this form:
>
> <msgSent>
> 	<time>time sent</time>
> 	<origin>me@xxxxxxxx</origin>
> 	<r>you@xxxxxxxxx</r>
> 	<recieved>time recieved</recieved>
> 	<status>Any error messages, etc</status>
> 	<r>you2@xxxxxxxxx</r>
> 	<recieved>time recieved</recieved>
> 	<status>Any error messages, etc</status>
> 	(this repeats for each recipient)
> </msgSent>
> (this repeats for each message)
>
> The problem is the <recieved> and <status> tags refer to the
> imediately preceding <r> tag. I would like to generate a list
> from these logs that contains only email addresses that had errors.

The first thing I do with defective designs like this is rationalise
the file so I can work with it. In this case it is simple to
pass it through sgmlnorm (part of James Clark's SP) pretending it
is SGML, so you can force the addition  of a new element type to
enclose r, recieved [do they really spell it like that?] and status.

$ sgmlnorm sgml-spec.dec message.sgml >message.xml

where sgml-spec.dec is (in my test) the old DocBook SGML Declaration
with GENERAL YES changed to GENERAL NO, and a trivial DTD:

<!ELEMENT msgSent - - (time,origin,trace+)>
<!ELEMENT trace O O (r,recieved,status?)>
<!ELEMENT (time,origin,r,recieved,status) - - (#PCDATA)>

(assuming status is optional and is only present where there has
been an error). The result is

<msgSent>
   <time>time sent</time>
   <origin>me@xxxxxxxx</origin>
   <trace>
     <r>you@xxxxxxxxx</r>
     <recieved>time recieved</recieved>
     <status>Any error messages, etc</status>
   </trace>
   <trace>
     <r>you2@xxxxxxxxx</r>
     <recieved>time recieved</recieved>
     <status>Any error messages, etc</status>
   </trace>
</msgSent>

(indents courtesy of xxml.el). Now you can test in XPath for
the presence or absence of "trace/status".

That took about a minute to write and another minute to test.
It makes a lot of assumptions about the non-use of declared or
undeclared entities, other element types and constraints you may
have omitted for brevity, etc. It might in your case be simpler
just to run it through sed or some other editing process to do
the same job. Some people will also consider it overkill: your
call
But it's a fine example of an XML structure designed without
forethought or foreknowledge: thanks for sharing it.

///Peter



XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list



Current Thread
Keywords