[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] Convet SGML data to XML format


Subject: Re: [xsl] Convet SGML data to XML format
From: Wendell Piez <wapiez@xxxxxxxxxxxxxxxx>
Date: Fri, 27 Mar 2009 18:22:48 -0400

Hi,

At 05:19 PM 3/27/2009, you wrote:
I have a question. While converting SGML to XML, there is an XML schema that the XML has to be validated against. I did the mapping between the SGML elements and the XML schema elements. I was not clear on how to use the n2x to be able to use the XML schema while converting the SGML.

Probably you wouldn't use it directly. You would perform the conversion and then validate the results using the XML schema as a secondary process.


This is actually a good thing: although you might have a schema that is intended to describe the results of the conversion, it might not be correct or complete. If your conversion is meant to be a straightforward syntactic conversion, any validation errors would be indications of unaddressed requirements for your XML schema, not for your conversion process. That is, if you are only changing the syntax of your documents, and may not rearrange or rename elements and attributes, then your XML schema has to be fitted to the results of your conversion, not the other way around.

And even if you assume the schema is correct and complete, you might not want the processor to decide what changes to make, if it has to make changes, to get the output to be valid. If you have an XML schema in hand, known (or defined) to be correct and complete, and the XML you get from rewriting your SGML as XML is not already valid, part of your conversion may have to involve transformation.

If this is the case, the problem is more complicated: you need to use SP or n2x or another SGML tool to make XML syntax, and then you need to alter this XML, making new XML that is valid to your schema.

Accordingly, it's easier to split your process up into distinct phases, dealing with syntax and tagging semantics (restructuring and renaming) separately:

1. Convert your SGML to XML syntax (a fairly straightforward syntactic conversion)
2. Optionally, derive an XML schema as a *descriptive* exercise, to help show what your new XML looks like and reveal where adjustments have to be made
3. Design and implement a transformation that maps your data from this XML to your target XML format
4. Validate against your target XML schema to check your results


Phase 1 can use SP or n2x. Phase 2 is actually optional, although very useful (it will help you do a better job with phase 3).

Phase 3 can use XSLT, which is why this post is (barely) on topic.

Phase 4 is essentially a test to see whether Phase 3 has been performed correctly. It does not guarantee the transformation is correct (a machine cannot do that without help), but it is necessary.

At no point do you need to use an XML schema directly with your SGML-to-XML conversion. (You do need your SGML DTD to parse your SGML though.)

How hard this all is really depends on how close your target XML schema already is to the SGML DTD. Making an XML schema to which your SGML (once it is syntactically XML) can be guaranteed valid without alteration is easier for some SGML DTDs than others. If your SGML DTD is very XML-like it might be fairly easy.

If it isn't, it may be easier to do the opposite: first make your XML, then the schema for it. Especially if your data set is bounded, you can cast your SGML into XML syntax, and then derive an XML schema to describe it. You would do this particularly if your data were more important than your schema. (Similarly, if you had a collection of fine porcelain, you might acquire the right number and size of boxes to store it in, instead of getting rid of some of it to fit the boxes you had.)

The deeper reasons for all this are rooted in modeling features of SGML that are not particularly XML-friendly and which cannot be readily expressed in XML schemas. The job you are looking at will be easier if your SGML does not use these features.

I hope that helps,
Wendell



======================================================================
Wendell Piez                            mailto:wapiez@xxxxxxxxxxxxxxxx
Mulberry Technologies, Inc.                http://www.mulberrytech.com
17 West Jefferson Street                    Direct Phone: 301/315-9635
Suite 207                                          Phone: 301/315-9631
Rockville, MD  20850                                 Fax: 301/315-8285
----------------------------------------------------------------------
  Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================


Current Thread