[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] Find invalid IDREFs


Subject: Re: [xsl] Find invalid IDREFs
From: Wendell Piez <wapiez@xxxxxxxxxxxxxxxx>
Date: Thu, 17 May 2012 17:10:21 -0400

Charles,

With 600 files, you may simply be able to pull them all together into a tree (using collection()) and then perform a key lookup into them. That's the first thing to try.

For example:

<xsl:key name="item-by-idrefs" match="*[@idrefs]"
         use="tokenize(@idrefs,'\s+')"/>

will provide for retrieving elements that have @idrefs attributes based on their whitespace-tokenized values.

Then in your stylesheet:

<xsl:template match="IDREFS">
  <xsl:apply-templates select="key('item-by-idrefs',IDREF)"/>
</xsl:template>

will apply templates to all the nodes whose @idrefs include the values listed as IDREF in your list. (Or you can do them individually if you need to keep track of each value as you go.)

Then the question becomes how you want your output to look. In your shoes I would probably use document-uri(/) to report the name of the file containing the offending node, and an XPath pointing the rest of the way. The XPath, in this case, could be as simple as

//*[tokenize(@idrefs,'\s+')=$IDREF]

where $IDREF is the offending IDREF value in the particular case and '@idrefs' is the name of the attribute on which it sits.

But you can easily optimize this sort of thing to your processes.

Ask again if you need more specifics.

Cheers,
Wendell

On 5/17/2012 4:42 PM, charlieo0@xxxxxxxxxxx wrote:
I am completely stuck figuring this out. I have list of IDREFs captured from an error report. I created a well-formed XML instance from the list of values. This is list of IDREFs that have no corresponding ID. I need to find these IDREF atrributes in a very large collection of files and generate a report that tells writers what files the errors are located.

The validation report was generated from a concatenated, very large complete manual. I can locate the errors in that file, but that's not how the manual is worked. I need to locate each error in the sub-files.

How do I walk through the list of IDREF elements, locate them in a collection of approx. 600 files and generate a report? Any help is appreciated.

XSLT 2.0

My list of IDREFs looks like this (abbreviated):


<?xml version="1.0" encoding="UTF-8"?> <IDREFS> <IDREF>NSN-</IDREF> <IDREF>NSN-</IDREF> <IDREF>m2025792355320</IDREF> <IDREF>m22040592355332</IDREF> <IDREF>m22040592355332</IDREF> <IDREF>m22040592355332</IDREF> <IDREF>m2034792355320</IDREF> <IDREF>m2050492355320</IDREF> </IDREFS>

My XSLT, which renders no results:


<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="xs" version="2.0">

    <xsl:output method="text"  omit-xml-declaration="yes" indent="no" />
    <xsl:param name="collection" select="collection('file:/M:/?select=M2*.xml')"/>
    <xsl:variable name="IDREFS" select="document('file:/C:/IDREFS.xml')"/>
     <xsl:key name="VALUE" match="IDREFS" use="child::IDREF"/>

     <xsl:template match="/">
         <xsl:result-document method="text" href="file:/C:/bae/test.txt">
         <xsl:for-each select="$collection//@wpid | //@itemid">
             <xsl:variable name="ATTR" select="."/>
             <xsl:for-each select="$IDREFS">
                 <xsl:value-of select="base-uri(.)"/><xsl:value-of select="key('VALUE',$ATTR)"/>
             </xsl:for-each>
             <xsl:text>&#x0A;</xsl:text>
         </xsl:for-each>
     </xsl:result-document>
     </xsl:template>

</xsl:stylesheet>


-- ====================================================================== Wendell Piez mailto:wapiez@xxxxxxxxxxxxxxxx Mulberry Technologies, Inc. http://www.mulberrytech.com 17 West Jefferson Street Direct Phone: 301/315-9635 Suite 207 Phone: 301/315-9631 Rockville, MD 20850 Fax: 301/315-8285 ---------------------------------------------------------------------- Mulberry Technologies: A Consultancy Specializing in SGML and XML ======================================================================


Current Thread
Keywords