[oXygen-user] Commenting and Documenting Customer XML
Wendell Piez
Fri May 2 09:26:20 CDT 2008
Karl,
The good folks at syncROsoft will undoubtedly chime in with features
to recommend, but be aware that what you are asking is very
open-ended: how do you use and develop tools to perform analytics on
documents and schemas. In the general case, I doubt it's possible to
build a set of tools for everyone that does this comprehensively and
well, given the variety of different sorts of document sets, their
schemas, how and whether they are already documented, their designs
and implementation patterns, and the requirements of different sorts
of transformations. This is not to say that there are no sweet spots
that a toolkit like oXygen can find (it's already found a few); just
that whatever oXygen gives you, inevitably you are going to be on
your own to an extent -- to say nothing of being in a position to
envision and possibly implement something that is very useful for you
without necessarily being useful to anyone else (or even to you on
the next project). That's just what life on the leading edge is like.
That having been said, the answer to your specific questions is
certainly yes. As you suggest, XSLT is great for this sort of thing.
Many things can also be accomplished ad-hoc, using XPath (especially
XPath 2.0) from oXygen's XPath query. So,
"distinct-values(//*/name())" will list the names of all elements
appearing in the document, while "distinct-values(//div/@class)" will
give you all the values of @class attributes appearing on div
elements. Etc. etc. (IINM oXygen has even promised to give us a way
to export the results of these queries in XML, which will be extra
useful.) I have also found Schematron to be very useful and fairly
lightweight for edge-case diagnostics over sets of documents.
For analytics over large sets of documents, for performance reasons
you may wish to load your documents into an XML database. Certain
databases can be configured as back ends to oXygen, as documented on the site.
The hardest part of this -- and the reason why it's not necessarily
easy to generalize -- is in defining your requirements: what do you
want to find and how do you want the report to look?
Having done that, implementing is generally pretty straightforward.
For example, here's a simple XSLT 2.0 template that lists all the
element types appearing with their occurrences by parent:
<xsl:template match="/">
<xsl:for-each-group select="//*" group-by="name()">
<element count="{count(current-group())}">
<name>
<xsl:value-of select="current-grouping-key()"/>
</name>
<xsl:for-each-group select="current-group()/.." group-by="name()">
<parent count="{count(current-group())}">
<xsl:value-of select="current-grouping-key()"/>
</parent>
</xsl:for-each-group>
</element>
</xsl:for-each-group>
</xsl:template>
Many useful variations on this are readily imaginable.
Cheers,
Wendell
p.s. I do find RelaxNG easier for analytics than XSD, but that may be
personal taste.
At 09:35 PM 4/30/2008, you wrote:
>Ok, baby steps!
>Using Trang Converter, I've created an XML schema from XML source.
>The output I chose was W#C XML-Schema (recommendations for other
>formats are welcome, I do not know the pros/cons here). Now I am
>going to mark up the schema with xs:documentation and xs:annotation.
>Rigth so far?
>
>Ok, how about this. I have my stylesheet and I have the XML source.
>Arbitrarily I have chosen to use a number of the elements in the
>stylesheet. Is it possible to create a USED and NOT USED resource of
>elements and attributes from the 2 documents? So a document outlining
>the uses, maybe the count of occurences? I could write another
>transformation to figure this out, er, but if it is built into Oxygen
>than that would be great.
>
>Karl..
======================================================================
Wendell Piez mailto:
Mulberry Technologies, Inc. http://www.mulberrytech.com
17 West Jefferson Street Direct Phone: 301/315-9635
Suite 207 Phone: 301/315-9631
Rockville, MD 20850 Fax: 301/315-8285
----------------------------------------------------------------------
Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================
More information about the oXygen-user
mailing list