[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: XML Transformation Language (was Re: removing HTML flow obje cts?)


Subject: Re: XML Transformation Language (was Re: removing HTML flow obje cts?)
From: Paul Prescod <papresco@xxxxxxxxxxxxxxxx>
Date: Wed, 27 May 1998 11:01:25 -0400

Rob McDougall wrote:
> 
> Paul,
> 
> I don't see how you can view applying styles to a document as a
> transformation, but changing an (albeit structured) flat file into a
> database is not.  

I usually reserve the word "transformation" for conversion between likes.
E.g. text string to text string is transformation. XML document to XML
document is a transformation. Tree to tree is a transformation. Converting
a flat file into a tree (i.e. "parsing") is not usually considered
transformation. Nor is converting a tree into a rendition (i.e.
"styling"). 

Hence, I don't feel that converting a tree into a database would be
considered transformatoin.

> I can't speak to the DSSSL flow objects, but the HTML
> flow objects in the original XSL submission definitely DO overlap all
> over the place (HTML tables have <TABLE><TD></TD></TABLE>, not to
> mention everything occurs within a <BODY>).

TDs within TABLEs do not overlap, but rather one is contained within the
other. You may think of a table as this:

<TABLE>
<TR><TD><TD><TD><TD></TR>
<TR><TD><TD><TD><TD></TR>
<TR><TD><TD><TD><TD></TR>
<TR><TD><TD><TD><TD></TR>
</TABLE>

but to the parser, it is identical to this:

<TABLE><TR><TD><TD><TD><TD></TR><TR><TD><TD><TD><TD></TR><TR><TD><TD><TD><TD></TR><TR><TD><TD><TD><TD></TR></TABLE>

It is one long string. But the tables, columsn and rows that a database
manipulates are not strings, but are really two-dimensional objects
(conceptually). Perhaps "overlap" is the wrong word to describe the
relationship between rows and columns in a two dimensional object, but I
think that you will understand why it is a very different situation. You
are talking about conversion from one data model to another.

> I agree that every database will have XML import, but think for a moment
> about what this entails.  Will the structure of the data being imported
> *always* match the structure of the database?  Will it even *usually*
> match the structure of the database?  If the two structures do not match
> (which I believe will be the most common case), then how will the user
> specify rules to transform the data's structure?  

I can think of a few ways. 

#1. You can do a transformation from one textual format ("ordinary XML")
to another textual format ("XML that fits our database schema")  This
looks like XML->tree->tree->XML->database or XML->events->XML->database .
An XSL-like XML transformation language could do this.

#2. You can do a transformation from one XML tree structure ("the output
of the parser") to another XML tree structure ("a tree that fits our
database schema") This looks like XML->tree->tree->database . An XSL-like
XML transformation language could do this.

#3. You can do a *conversion* from the XML tree structure to a
two-dimensional table/column model. XSL is *not* appropriate for this,
because XSL is designed for tree->tree conversions. You could invent a
language for doing this, and hopefully this language will be embedded in
major database systems. If it is powerful enough, then you will not need
to use it in tandem with one of the other systems. If it is not, then you
will have to do a tree->tree transform first.

> Sure they could code
> up some Java, C++ or Python.  Couldn't you say the same thing for
> transforming XML into HTML?  

Yes, but the difference is that your database schema is inherently
non-portable. We need XSL so that we can ship random XML documents to
random people. But why would you want to distribute your XML->database
mapping over the web?

> All applications that read "generic" XML will need the ability to
> transform the structure of the incoming XML to match the structure they
> need.  I think it would be highly advantageous if there was a standard
> transformation language.

XSL does only tree->tree. So it can map into tables, rows and cells
(easily), but will probably not be efficient for real 2D database tables.
To handle real 2D tables elegantly, you would want named rows and cells,
and the ability to peg content directly to rows and cells. You would also
want to be able to verify constraints on cells before inserting them into
the database. You want to check that every row has the data that it is
expecting. I'm sure there are other things specific to the tree->database
problem that I haven't thought of yet as well.

 Paul Prescod  - http://itrc.uwaterloo.ca/~papresco

Three things to be wary of: A new kid in his prime
A man who knows the answers, and code that runs first time
http://www.geezjan.org/humor/computers/threes.html


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list



Current Thread
Keywords