[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

RE: [xsl] Number of scans required ??


Subject: RE: [xsl] Number of scans required ??
From: "Michael Kay" <mhk@xxxxxxxxx>
Date: Sun, 10 Aug 2003 18:00:23 +0100

> I guessed it will be complicated. Here is the short version 
> of my big xml.
> 
> Below is my xml.
> 
> *********************************************************************
> <?xml version="1.0" encoding="UTF-8"?>
> <!DOCTYPE AEXDATAEXTRACT SYSTEM "AeXDataExtract_2_2.dtd">
> 
> <AEXDATAEXTRACT DTD_VERSION="2.2" 
> EXTRACT_START_DATETIME="7/15/2003 11:03:25 
> AM" EXTRACT_TYPE="FULL">
> 
>   <RESOURCE_TYPE 
> GUID="{493435f7-3b17-4c4c-b07f-c23e7ab7781f}" NAME="Computer" 
> DESCRIPTION="Definition for Computer" SOURCE="IS" 
> CREATED_DATE="5/21/2003 
> 9:47:08 PM" MODIFIED_DATE="5/21/2003 9:53:57 PM" DELETED="0">
> 
>     <RESOURCE GUID="{8EDCCB48-AC8D-474C-852B-B3235563CEA7}" 
> NAME="P??W?" 
> SOURCE="" SITE_CODE="abc.com" DOMAIN="">
> 
>       <INVENTORY>
> 
>         <ASSET>
> 
>           <IDENTIFICATION>
> 
>             <ATTRIBUTE NAME="Name">P??W?</ATTRIBUTE>
>             <ATTRIBUTE NAME="Domain" NULL="FALSE" />
>             <ATTRIBUTE NAME="Altkey1" NULL="TRUE" />
>             <ATTRIBUTE NAME="Altkey2" NULL="TRUE" />
> 
>           </IDENTIFICATION>
> 
>           <CLASS NAME="Active_Directory_Details" />
>           <CLASS NAME="Network_Printer_Details" />
>           <CLASS NAME="User_Contact_Details" />
>           <CLASS NAME="User_General_Details" />
>         </ASSET>
> 
>         <CUSTOM>
>           <CLASS NAME="FID_OS_System_Info" />
>           <CLASS NAME="FID_SW_C2P2" />
>           <CLASS NAME="FID_SW_ESM" />
>           <CLASS NAME="FID_SW_IE_Patch" />
>           <CLASS NAME="FID_SW_Most_Frequent_User" />
>           <CLASS NAME="FID_SW_NAV_Management" />
>           <CLASS NAME="FID_SW_Tag_File" />
>           <CLASS NAME="FID_SW_Virus_Definitions" />
>         </CUSTOM>
>       </INVENTORY>
>     </RESOURCE>
>    </RESOURCE_TYPE>
>   </AEXDATAEXTRACT>
> 
> **********************************************************************
> 
> 
> In the above xml <AEXDATAEXTRACT> element is table name. I 
> will generate 
> Primary key for it using generate id which will be its first 
> column, second 
> column will be DTD_VERSION and so on for that table.
> In the output at the topmost line information like this should come.
> 
> Output
> ------
> AexID,DTD_Version,EXTRACT_START_DATETIME,EXTRACT_TYPE,

At first sight this is simple:

<xsl:for-each select="/AEXDATAEXTRACT/@*">
  <xsl:text>AexID,</xsl:text>
  <xsl:value-of select="name()"/>
  <xsl:text>,</xsl:text>
</xsl:for-each>

But there's a bug in this: it assumes that the order of attributes will
be retained. In fact, the order of attributes in XML is undefined, so
this could output the attributes in any order. If you need the column
names in this order, you are going to have to redesign the source XML
file.
> 
> I want its data in second scan. I WILL EXPLAIN ABOUT THIS LATER.
> 
> Then when processor encounters new table name i.e. 
> RESOURCE_TYPE it will take 
> all the columns for this table and add parentID generated to 
> it. So the first 
> line of the output should look like this.

I don't like this "when the processor encounters". It's better to
describe the processing in terms of what output you want to be produced,
and how it is derived from the input, not in terms of a particular order
of processing.

Simplistically, it looks like:

<xsl:template match="RESOURCE_TYPE">
<xsl:for-each select="@*">
  <xsl:value-of select="name()"/>
  <xsl:text>,</xsl:text>
</xsl:for-each>

But again there is the problem that you are depending on the order of
attributes in XML.
> 
> Output
> ------- 
> AexID,DTD_Version,EXTRACT_START_DATETIME,EXTRACT_TYPE,Resource
> _Type_GUID (or 
> only GUID both will do even if GUID is there in the other 
> table),NAME,DESCRIPTION,SOURCE,CREATED_DATE,MODIFIED_DATE,DELETED>

Does the note in parentheses mean that you have a requirement to
eliminate duplicates here, i.e. to include a column once only if it
appears on multiple "tables"? If so, you need to understand how
elimination of duplicates is done in XSLT. This is essentially the same
problem as grouping, and is discussed at
http://www.jenitennison.com/xslt/grouping.

> 
> This it should do for all the tables. I have scan this input 
> xml six times. I 
> am creating six different outputs as there are 6 different 
> items under 
> INVENTORY tag for example ASSET, CUSTOM and there are 4 
> others like this. 
> Within these there are tables (IDENTIFICATION and CLASS 
> classifies them as 
> another tables) and their names are in
> <CLASS> tag's attribute "NAME" and columens are mentioned in 
> ATTRIBUTE TAG and 
> their value is in the attribute body.
> 
> 
>  <INVENTORY>
>        <ASSET>
>          <IDENTIFICATION>
>            <ATTRIBUTE NAME="Name">P??W?</ATTRIBUTE>
>            <ATTRIBUTE NAME="Domain" NULL="FALSE" />
>            <ATTRIBUTE NAME="Altkey1" NULL="TRUE" />
>            <ATTRIBUTE NAME="Altkey2" NULL="TRUE" />
>          </IDENTIFICATION>
> 
>          <CLASS NAME="Active_Directory_Details" />
>          <CLASS NAME="Network_Printer_Details" />
>          <CLASS NAME="User_Contact_Details" />
>          <CLASS NAME="User_General_Details" />
>     </ASSET>
>    </INVENTORY>
> 
> THERE MAY BE CASES LIKE TABLE WILL NOT HAVE NE DATA.

Sorry, what is "NE data"?
> 
> So where table data is present i want to have column names on 
> the topmost 
> line.

I can't see what the relationship is between your data and the column
names. You're using all sorts of terminology like "parent tags", "inner
tables", etc - you clearly have a lot of understanding of the semantics
of this document structure which you aren't communicating very
effectively.

> 
> Then data corresponding to these columns will be obtained in 
> another scan. IF 
> IT IS POSSIBLE TO GET THE DATA IN THE SAME SCAN PLS INFORM ME 
> HOW TO DO THAT. Then since parent tags comes only once and 
> data for other innner tables is 
> presnt in huge numbers my output will look like this.

You really shouldn't be worrying about how many scans are done. Get the
code correct and working first, see whether it meets the performance
requirements, and if it doesn't, only then start thinking about how to
make it faster.
> 
> Ouput
> ---- 
> AexID,DTD_Version,EXTRACT_START_DATETIME,EXTRACT_TYPE,Resource
> _Type_GUID (or 
> only GUID both will do even if GUID is there in the other 
> table),NAME,DESCRIPTION,SOURCE,CREATED_DATE,MODIFIED_DATE,DELETED>
> 
> First line:

Is this all one line? If this ("AexID",2.2,7/15/2003...) is the first
line, then what is the line above
(AexID,DTD_Version,EXTRACT_START_DATETIME...)? Sorry, but you are
confusing me more and more.

Michael Kay

> 
> "AexID",2.2,7/15/2003 11:03:25 
> AM,FULL,{493435f7-3b17-4c4c-b07f-c23e7ab7781f},Computer, 
> Definition for 
> Computer,IS,5/21/2003 9:47:08 PM,5/21/2003 9:53:57 
> PM,0,{8359DF92-1E29-409D-8189-79BE7C411171},{493435f7-3b17-4c4
> c-b07f-c23e7ab77
> 81f},0001026361C5,,abc.com,WORKGROUP,Win32,unknown,,0,0,,,IDAU
> JCFI,{8359DF92-1
> E29-409D-8189-79BE7C411171},IDAVJCFI,IDAUJCFI,IDENTIFICATION,I
> DAWJCFI,IDAVJCFI
> ,0001026361C5,WORKGROUP,,
> 
> ```````{5F1D1043-F808-4AB8-A35F-9DE1DE448F41}`{493435f7-3b17-4
> c4c-b07f-c23e7ab
> 7781f}`216.16.236.246``abc.fmr.com`WORKGROUP````````IDA22CFI`{
5F1D1043-F808-4A
> B8-A35F-9DE1DE448F41}`IDA32CFI`IDA22CFI`IDENTIFICATION`IDA42CF
> I`IDA32CFI`172.2
> 6.45.73`WORKGROUP``
> 
> 
> 
> As you can see in the above output whereeve data is not there 
> i keep it as 
> blank and only seperators i.e. ,,
> 
> I hope now you will get what i m trying to say.
> 
> Eagerly waiting for your reply.
> 
> Regards,
> Dipesh
> 
> 
> 
> 
> Date: Fri, 8 Aug 2003 12:03:23 +0100
> From: "Michael Kay" <mhk@xxxxxxxxx>
> Subject: RE: [xsl] Number of scans required ??
> 
> >
> > Thanks a lot for replying.
> >
> > Well my document is big enough thats why i haven't pasted it there. 
> > But i can generailize how it is and then i think it will give you 
> > proper idea.
> >
> > <RootNode>
> > <FirstChild> Some Attributes which are columns : </FirstChild> <A> 
> > More column names as attributes. <B> More column for this table 
> > (corresponding to DB) One or two more level of identations 
> like this.
> > </B>
> > </A>
> > </FirstChild>
> 
> No, I'm afraid this doesn't give me a proper idea at all. Are 
> your columns represented by attributes or elements? You've 
> said attributes, but in that case why aren't they within the 
> start tag?
> 
> Secondly you have a structure here that is four levels deep 
> (plus one or two), yet you are using the terminology of 
> tables and columns to describe it. That doesn't fit.
> 
> Please producce a cut-down example of your actual document. 
> Or perhaps a schema/DTD. I don't understand it from this 
> description at all.
> 
> Finally, I don't know what you mean by a "scan". I suggest 
> you concentrate on writing some correct code first, and then 
> worry about how many times the XSLT processor is scanning 
> your source document. Apart from anything else, since the 
> data is in memory, the number of times the document is 
> scanned is not necessarily critical to performance.
> 
> Michael Kay
> 
> 
>  XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
> 


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list



Current Thread
Keywords