[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

RE: [xsl] Number of scans required ??


Subject: RE: [xsl] Number of scans required ??
From: Dipesh Khakhkhar <dkhakhkh@xxxxxxxxxxxxxxx>
Date: Wed, 13 Aug 2003 10:36:03 -0400

Hi Michael,

Well i agree it is confusing. Sorry for not being able to cleary put my 
question.

Here once again i will ask you my question taking simple example:

XML INPUT
=========
In my xml i will have the following data at one place. OKAY.



<TABLE NAME="Client_Agent">
  <ROW>
    <COLUMN NAME="Agent Name">eXpress NS Client</COLUMN>
    <COLUMN NAME="Product Version">5.5.0.517</COLUMN>
    <COLUMN NAME="Build Number">517</COLUMN>
  </ROW>

  <ROW>
     <COLUMN NAME="Agent Name">eXpress Inventory Solution</COLUMN>
     <COLUMN NAME="Product Version">5.5.0.424</COLUMN>
     <COLUMN NAME="Build Number">424</COLUMN>
  </ROW>
</TABLE>

In the above data there are 3 rows of data and there are 3 columns in it.
---------------------------------------------------------------------

In the same xml I have the following data at some other place.



<TABLE NAME="Client_Agent">
  <ROW>
    <COLUMN NAME="Agent Name">eXpress NS Client</COLUMN>
    <COLUMN NAME="Install Path">C:\Program Files\ABC\eXpress\NS 
Client\Software Delivery\Software 
Packages\{01B54EB5-3679-4C73-9E10-E169D5EA8C59}</COLUMN>
    <COLUMN NAME="Product Version">5.5.0.519</COLUMN>
    <COLUMN NAME="Build Number">519</COLUMN>

  </ROW>

  <ROW>
     <COLUMN NAME="Agent Name">eXpress Inventory Solution</COLUMN>
     <COLUMN NAME="Install Path">C:\Program Files\ABC\eXpress\NS      
Client\Software Delivery\Software 
Packages\{01B54EB5-4579-4C73-9E10-E169D5DA9E59}</COLUMN>
     <COLUMN NAME="Product Version">5.5.0.428</COLUMN>
     <COLUMN NAME="Build Number">428</COLUMN>
  </ROW>
</TABLE>



Here as shown there are 2 rows and NOW THERE ARE 4 COLUMNS IN IT.

Now my question is i want a output in which the first line in the output will 
have the column names. For the above example I would like to have the 
following output.

Expected Ouput
==============

Agent Name,Install Path,Product Version,Build Number
eXpress NS Client,,5.5.0.517,517
eXpress Inventory Solution,,5.5.0.424,424
eXpress NS Client,C:\Program Files\ABC\eXpress\NS Client\Software 
Delivery\Software 
Packages\{01B54EB5-3679-4C73-9E10-E169D5EA8C59},5.5.0.519,519
eXpress Inventory Solution,C:\Program Files\ABC\eXpress\NS      
Client\Software Delivery\Software 
Packages\{01B54EB5-4579-4C73-9E10-E169D5DA9E59},5.5.0.428,428



================================================================

As shown in the above output "," is the seperator between the data.
And I have to scan the whole file to get the first line ouput i.e. the column 
name.

Now my problem is. I will process the data which i have listed first and get 3 
column names. I will write that in the output file. Now again the same data 
comes when there are 4 columns in it and I have to write the column name in 
between the two columns which i have already written, HOW CAN I DO THAT IN XSL 
? BECAUSE XSL IS NOT LIKE NORMAL PROGRAMMING LANGUAGE WHICH WILL ALLOW ME TO 
PLAY WITH THE FILE POINTER. AM I RIGHT ?

So in case if i do get success in writting the first line.(I DONT KNOW 
HOW...BUT I WILL TRY IF THAT IS POSSIBLE). THEN I HAVE TO WRITE THE DATA 
ACCORDINGLY. I MEAN FOR THE FIRST TWO ROWS I DONT HAVE TO WRITE IN COLUMN TWO 
I HAVE TO LEAVE BLANK THERE WITH SEPERATORS ONLY. BY ANY CHANCE I CAN COME TO 
KNOW ABOUT THIS ?? SIMILARLY WHEN I GET THE NEXT TWO ROWS WHERE THERE IS THE 
FOURTH COLUMN I WILL HAVE TO WRITE DATA IN ALL THE FOUR COLUMNS.

Because the output is desired in this format i was thinking to parse it twice 
or something like that. THATS WHY I ASKED HOW MANY TIMES I HAVE TO SCAN THE 
INPUT FILE.

I hope i made my question clear now.

Thanks in the anticipation that I WILL GET SOME GOOD SUGGESTION FROM THE GURUS 
WHETHER IT IS POSSIBLE WHAT I M TRYTING TO DO OR NOT.

Eagerly waiting for reply.

Regards,
Dipesh



Date: Sun, 10 Aug 2003 18:00:23 +0100
From: "Michael Kay" <mhk@xxxxxxxxx>
Subject: RE: [xsl] Number of scans required ??

> I guessed it will be complicated. Here is the short version
> of my big xml.
>
> Below is my xml.
>
> *********************************************************************
> <?xml version="1.0" encoding="UTF-8"?>
> <!DOCTYPE AEXDATAEXTRACT SYSTEM "AeXDataExtract_2_2.dtd">
>
> <AEXDATAEXTRACT DTD_VERSION="2.2"
> EXTRACT_START_DATETIME="7/15/2003 11:03:25
> AM" EXTRACT_TYPE="FULL">
>
> <RESOURCE_TYPE
> GUID="{493435f7-3b17-4c4c-b07f-c23e7ab7781f}" NAME="Computer"
> DESCRIPTION="Definition for Computer" SOURCE="IS"
> CREATED_DATE="5/21/2003
> 9:47:08 PM" MODIFIED_DATE="5/21/2003 9:53:57 PM" DELETED="0">
>
> <RESOURCE GUID="{8EDCCB48-AC8D-474C-852B-B3235563CEA7}"
> NAME="P??W?"
> SOURCE="" SITE_CODE="abc.com" DOMAIN="">
>
> <INVENTORY>
>
> <ASSET>
>
> <IDENTIFICATION>
>
> <ATTRIBUTE NAME="Name">P??W?</ATTRIBUTE>
> <ATTRIBUTE NAME="Domain" NULL="FALSE" />
> <ATTRIBUTE NAME="Altkey1" NULL="TRUE" />
> <ATTRIBUTE NAME="Altkey2" NULL="TRUE" />
>
> </IDENTIFICATION>
>
> <CLASS NAME="Active_Directory_Details" />
> <CLASS NAME="Network_Printer_Details" />
> <CLASS NAME="User_Contact_Details" />
> <CLASS NAME="User_General_Details" />
> </ASSET>
>
> <CUSTOM>
> <CLASS NAME="FID_OS_System_Info" />
> <CLASS NAME="FID_SW_C2P2" />
> <CLASS NAME="FID_SW_ESM" />
> <CLASS NAME="FID_SW_IE_Patch" />
> <CLASS NAME="FID_SW_Most_Frequent_User" />
> <CLASS NAME="FID_SW_NAV_Management" />
> <CLASS NAME="FID_SW_Tag_File" />
> <CLASS NAME="FID_SW_Virus_Definitions" />
> </CUSTOM>
> </INVENTORY>
> </RESOURCE>
> </RESOURCE_TYPE>
> </AEXDATAEXTRACT>
>
> **********************************************************************
>
>
> In the above xml <AEXDATAEXTRACT> element is table name. I
> will generate
> Primary key for it using generate id which will be its first
> column, second
> column will be DTD_VERSION and so on for that table.
> In the output at the topmost line information like this should come.
>
> Output
> ------
> AexID,DTD_Version,EXTRACT_START_DATETIME,EXTRACT_TYPE,

At first sight this is simple:

<xsl:for-each select="/AEXDATAEXTRACT/@*">
<xsl:text>AexID,</xsl:text>
<xsl:value-of select="name()"/>
<xsl:text>,</xsl:text>
</xsl:for-each>

But there's a bug in this: it assumes that the order of attributes will
be retained. In fact, the order of attributes in XML is undefined, so
this could output the attributes in any order. If you need the column
names in this order, you are going to have to redesign the source XML
file.
>
> I want its data in second scan. I WILL EXPLAIN ABOUT THIS LATER.
>
> Then when processor encounters new table name i.e.
> RESOURCE_TYPE it will take
> all the columns for this table and add parentID generated to
> it. So the first
> line of the output should look like this.

I don't like this "when the processor encounters". It's better to
describe the processing in terms of what output you want to be produced,
and how it is derived from the input, not in terms of a particular order
of processing.

Simplistically, it looks like:

<xsl:template match="RESOURCE_TYPE">
<xsl:for-each select="@*">
<xsl:value-of select="name()"/>
<xsl:text>,</xsl:text>
</xsl:for-each>

But again there is the problem that you are depending on the order of
attributes in XML.
>
> Output
> -------
> AexID,DTD_Version,EXTRACT_START_DATETIME,EXTRACT_TYPE,Resource
> _Type_GUID (or
> only GUID both will do even if GUID is there in the other
> table),NAME,DESCRIPTION,SOURCE,CREATED_DATE,MODIFIED_DATE,DELETED>

Does the note in parentheses mean that you have a requirement to
eliminate duplicates here, i.e. to include a column once only if it
appears on multiple "tables"? If so, you need to understand how
elimination of duplicates is done in XSLT. This is essentially the same
problem as grouping, and is discussed at
http://www.jenitennison.com/xslt/grouping.

>
> This it should do for all the tables. I have scan this input
> xml six times. I
> am creating six different outputs as there are 6 different
> items under
> INVENTORY tag for example ASSET, CUSTOM and there are 4
> others like this.
> Within these there are tables (IDENTIFICATION and CLASS
> classifies them as
> another tables) and their names are in
> <CLASS> tag's attribute "NAME" and columens are mentioned in
> ATTRIBUTE TAG and
> their value is in the attribute body.
>
>
> <INVENTORY>
> <ASSET>
> <IDENTIFICATION>
> <ATTRIBUTE NAME="Name">P??W?</ATTRIBUTE>
> <ATTRIBUTE NAME="Domain" NULL="FALSE" />
> <ATTRIBUTE NAME="Altkey1" NULL="TRUE" />
> <ATTRIBUTE NAME="Altkey2" NULL="TRUE" />
> </IDENTIFICATION>
>
> <CLASS NAME="Active_Directory_Details" />
> <CLASS NAME="Network_Printer_Details" />
> <CLASS NAME="User_Contact_Details" />
> <CLASS NAME="User_General_Details" />
> </ASSET>
> </INVENTORY>
>
> THERE MAY BE CASES LIKE TABLE WILL NOT HAVE NE DATA.

Sorry, what is "NE data"?
>
> So where table data is present i want to have column names on
> the topmost
> line.

I can't see what the relationship is between your data and the column
names. You're using all sorts of terminology like "parent tags", "inner
tables", etc - you clearly have a lot of understanding of the semantics
of this document structure which you aren't communicating very
effectively.

>
> Then data corresponding to these columns will be obtained in
> another scan. IF
> IT IS POSSIBLE TO GET THE DATA IN THE SAME SCAN PLS INFORM ME
> HOW TO DO THAT. Then since parent tags comes only once and
> data for other innner tables is
> presnt in huge numbers my output will look like this.

You really shouldn't be worrying about how many scans are done. Get the
code correct and working first, see whether it meets the performance
requirements, and if it doesn't, only then start thinking about how to
make it faster.
>
> Ouput
> ----
> AexID,DTD_Version,EXTRACT_START_DATETIME,EXTRACT_TYPE,Resource
> _Type_GUID (or
> only GUID both will do even if GUID is there in the other
> table),NAME,DESCRIPTION,SOURCE,CREATED_DATE,MODIFIED_DATE,DELETED>
>
> First line:

Is this all one line? If this ("AexID",2.2,7/15/2003...) is the first
line, then what is the line above
(AexID,DTD_Version,EXTRACT_START_DATETIME...)? Sorry, but you are
confusing me more and more.

Michael Kay

>
> "AexID",2.2,7/15/2003 11:03:25
> AM,FULL,{493435f7-3b17-4c4c-b07f-c23e7ab7781f},Computer,
> Definition for
> Computer,IS,5/21/2003 9:47:08 PM,5/21/2003 9:53:57
> PM,0,{8359DF92-1E29-409D-8189-79BE7C411171},{493435f7-3b17-4c4
> c-b07f-c23e7ab77
> 81f},0001026361C5,,abc.com,WORKGROUP,Win32,unknown,,0,0,,,IDAU
> JCFI,{8359DF92-1
> E29-409D-8189-79BE7C411171},IDAVJCFI,IDAUJCFI,IDENTIFICATION,I
> DAWJCFI,IDAVJCFI
> ,0001026361C5,WORKGROUP,,
>
> ```````{5F1D1043-F808-4AB8-A35F-9DE1DE448F41}`{493435f7-3b17-4
> c4c-b07f-c23e7ab
> 7781f}`216.16.236.246``abc.fmr.com`WORKGROUP````````IDA22CFI`{
5F1D1043-F808-4A
> B8-A35F-9DE1DE448F41}`IDA32CFI`IDA22CFI`IDENTIFICATION`IDA42CF
> I`IDA32CFI`172.2
> 6.45.73`WORKGROUP``
>
>
>
> As you can see in the above output whereeve data is not there
> i keep it as
> blank and only seperators i.e. ,,
>
> I hope now you will get what i m trying to say.
>
> Eagerly waiting for your reply.
>
> Regards,
> Dipesh


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list



Current Thread
Keywords