[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] Generic stylesheet to flatten XML hierarchy

Subject: Re: [xsl] Generic stylesheet to flatten XML hierarchy
From: "C. M. Sperberg-McQueen" <cmsmcq@xxxxxxxxxxxxxxxxx>
Date: Fri, 4 Dec 2009 19:35:21 -0700

On 4 Dec 2009, at 12:37 , Sara Mitchell wrote:

...
With input like this:
<rss ...some attributes>
  ...
</rss>
I would like XML output like this:

<root> <row> <rss-attr1>value</rss-attr1> ... </row> <row>...again rss attributes, channel attributes, non-repeating children of channel followed by fields for second item </row> ...more rows ... </root>


I'm having trouble seeing exactly what should be going on here,
because I can't see anything in your sample input (elided here
without loss of generality) that gives rise to the name
'rss-attr1'.  It's hard to correlate input with output if
all the values are spelled 'value' and some details in one
half of the input / output pair correspond to ellipses in the
other.

This example is for a single level of repeating descendants, but my solution has to be able to handle any level of repeating descendants. More over, the stylesheet has no knowledge of the structure of the input document.


My very strong gut reaction here is to suspect that such an
absolutely generic transformation is unlikely to produce helpful
(or: meaningful) output in some unknown but possibly large
percentage of cases.

Perhaps the transformation you have in mind is intended to
work generically on all XML documents that follow certain
conventions in structuring the information they represent?
Can you say what those conventions are?

Perhaps you have a very clear understanding of the transform you
want, but so far this discussion has not elicited a clear
description from you.  The following questions are intended to
try to elicit some more clarity.

In a generic XML document, there are elements with parents,
left and right siblings, children, descendants, and attributes.

In a generic table, there are rows and columns.  Each row but
the first or last has a predecessor and a successor, and ditto
each column but the first or last.

What is the relationship between the elements, attributes,
containment and sibling relations in the input, and the
rows and columns and their sequence relations in the output?

Given your output table, should I expect to have all the
information present in the XML?  Can I recreate the XML from
your table?

Do all your rows have the same number of columns?  (I suppose
they must, or it's not much of a table, but perhaps I'd
better check?)

When does an XML document give rise to a single row in the output
table?  When does it give rise to exactly three rows?  When
does the resulting table have exactly one column?

What information do the labels of columns convey?

What tables would you want to produce for the documents

(1) <e/> (2) <e><e n="23"/><e n="45">Pax</e></e> (3) <table> <row a="1" b="2" c="34">998</row> <row a="2" b="22" c="34">999</row> <row a="3" b="2" c="3">1000</row> <row a="4" b="24" c="">1001</row> <row a="5" x="Viva Villa!" c="34">998</row> </table> (4) <p>This isn't mixed content, because the schema says I'm a string.</p>

?

I have a solution that works ok by traversing the input document in doc order -- but it does not handle the siblings of repeating nodes that are not themselves repeating.

I have thought of doing this the opposite way, get a key of all repeating nodes and process only those at the lowest depth to generate rows. I haven't actually written the logic.


I gather that the tables you want to generate have something
to do with multiple occurrences of elements with the same name.
Does adjacency matter, or would

<a><b/><b/><b/><c/><c/><c/></a>

be treated differently from

<a><b/><c/><b/><c/><b/><c/></a>

?  (Assume if you like, for purposes of discussion, that the b and c
and a elements all have interesting attributes.)

Any better ideas would be welcome.


Your example reminds me of the contortions I've seen people
go to trying to represent structured information in RFC 822
attribute-value pairs.  So the best idea I have at the moment
is:  Save yourself!  Don't do it!

But probably you know exactly what you're doing, there is a perfectly
reasonable algorithm for what you want, and I just haven't
understood.

hth

--
****************************************************************
* C. M. Sperberg-McQueen, Black Mesa Technologies LLC
* http://www.blackmesatech.com
* http://cmsmcq.com/mib
* http://balisage.net
****************************************************************

Current Thread
Re: [xsl] Generic stylesheet to flatten XML hierarchy, (continued) G. Ken Holman - 3 Dec 2009 19:57:25 -0000 Michael Kay - 3 Dec 2009 23:54:50 -0000 Sara Mitchell - 4 Dec 2009 19:28:17 -0000 Sara Mitchell - 4 Dec 2009 19:37:54 -0000 C. M. Sperberg-McQueen - 5 Dec 2009 02:35:49 -0000 <= Michael Kay - 5 Dec 2009 10:27:03 -0000 Sara Mitchell - 7 Dec 2009 18:45:32 -0000 Sara Mitchell - 7 Dec 2009 18:49:23 -0000

Current Thread

Re: [xsl] Generic stylesheet to flatten XML hierarchy, (continued)
- G. Ken Holman - 3 Dec 2009 19:57:25 -0000
- Michael Kay - 3 Dec 2009 23:54:50 -0000
  - Sara Mitchell - 4 Dec 2009 19:28:17 -0000
  - Sara Mitchell - 4 Dec 2009 19:37:54 -0000
    - C. M. Sperberg-McQueen - 5 Dec 2009 02:35:49 -0000 <=
    - Michael Kay - 5 Dec 2009 10:27:03 -0000
    - Sara Mitchell - 7 Dec 2009 18:45:32 -0000
    - Sara Mitchell - 7 Dec 2009 18:49:23 -0000

<- Previous	Index	Next ->
RE: [xsl] Generic stylesheet to fla, Sara Mitchell	Thread	RE: [xsl] Generic stylesheet to fla, Michael Kay
[xsl] convert css3 to xsl-fo, Jack Bates	Date	Re: [xsl] XPath - accessing nodes w, Mukul Gandhi
	Month

Keywords

schema
xml

Re: [xsl] Generic stylesheet to flatten XML hierarchy

Products

Features

Shop

Resources

Support

Company