[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] Xpath Syntax Issue


Subject: Re: [xsl] Xpath Syntax Issue
From: Nathan Tallman <ntallman@xxxxxxxxx>
Date: Sun, 24 Jun 2012 12:26:25 -0400

Sorry, here's my XSLT (remove.xsl):

<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:s="http://www.sitemaps.org/schemas/sitemap/0.9"
    exclude-result-prefixes="s"
>

    <xsl:output method="xml" encoding="UTF-8" indent="yes"/>

    <xsl:strip-space elements="*"/>

    <!-- Standard copy -->
    <xsl:template match="*">
        <xsl:copy>
            <xsl:copy-of select="@*"/>
            <xsl:apply-templates/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="s:urlset/s:url[normalize-space(s:loc) = 'URL']"/>

</xsl:stylesheet>

XML Snippet (sitemap1.xml):
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
     xsi:schemaLocation="
     http://www.sitemaps.org/schemas/sitemap/0.9
     http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">

     <url>
          <loc>URL</loc>
          <lastmod>2012-06-23T13:37:27+00:00</lastmod>
          <changefreq>monthly</changefreq>
          <priority>1.0</priority>
     </url>
     ....
</urlset>

Command used in Linux:
xsltproc -o sitemapb.xml remove.xsl sitemap1.xml

(In case anyone is wondering why I want to remove URLs from a sitemap,
there are a few pages generated by a script, purely for crawling
reasons, as the pages don't crawl well otherwise. The sitemap feeds
the indexing engine for our website and I don't want these artificial
pages cluttering up search results. So after the sitemap is generated,
I want to run this XSLT to remove the URLs before the indexer starts.)

Thanks,
Nathan


On Sun, Jun 24, 2012 at 11:31 AM, Michael Kay <mike@xxxxxxxxxxxx> wrote:
>
>
> On 24/06/2012 15:35, Nathan Tallman wrote:
>>
>> Is there any reason why this transformation works in Oxygen, using
>> Saxon and xsltproc, yet doesn't work from the Linux command line using
>> xsltproc? When running from the command line, all the attributes from
>> urlset are removed, but the unwanted URLs remain.
>
>
> I for one haven't followed this thread in detail, so I'm not sure what
"this
> transformation" refers to.
>
> Michael Kay
> Saxonica
>
>>
>> On Sat, Jun 23, 2012 at 10:56 PM, Nathan Tallman<ntallman@xxxxxxxxx>
>>  wrote:
>>>
>>> Thanks Chris. I had just found this explanation on
>>>
>>>
<http://stackoverflow.com/questions/3836121/xslt-does-not-work-when-i-include
-xmlns-http-www-sitemaps-org-schemas-sitemap>
>>> when your email came in. This takes care of it.
>>>
>>> Much appreciation.
>>> Nathan
>>>
>>> On Sat, Jun 23, 2012 at 10:51 PM, Christopher R. Maden<crism@xxxxxxxxx>
>>>  wrote:
>>>>
>>>> -----BEGIN PGP SIGNED MESSAGE-----
>>>> Hash: SHA1
>>>>
>>>> On 06/23/2012 10:38 PM, Nathan Tallman wrote:
>>>>>
>>>>> I still wasn't getting the results in my application, so I created
>>>>> pets.xml and sure enough the template worked. It only works with
>>>>> my original document if I remove attributes found in the root
>>>>> element.
>>>>>
>>>>> The original first 6 lines:<?xml version="1.0" encoding="UTF-8"?>
>>>>> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
>>>>> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>>>>> xsi:schemaLocation=" http://www.sitemaps.org/schemas/sitemap/0.9
>>>>> http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
>>>>>
>>>>> I had to remove all attributes from<urlset>  before the XSL would
>>>>> work. Do I need to reference the schema in my XSL?
>>>>
>>>> Ahh... the good ol namespace FAQ.
>>>>
>>>> Every element type name is a pair: namespace URI and local name.
>>>>
>>>> What you thought was null-namespace plus species is in fact
>>>> http://www.sitemaps.org/schemas/sitemap/0.9 plus species (often
>>>> written as {http://www.sitemaps.org/schemas/sitemap/0.9}species).  An
>>>> XPath expression matching just species matches {}species, which is a
>>>> *different name* than
>>>> {http://www.sitemaps.org/schemas/sitemap/0.9}species.
>>>>
>>>> You need, in your XSLT, to declare something like
>>>> xmlns:sitemap="http://www.sitemaps.org/schemas/sitemap/0.9" and then
>>>> use sitemap:species in your XPath.  (A shorter prefix might be in
>>>> order, but a prefix is required for XSLT 1.0 and recommended (IMO) for
>>>> clarity for XSLT 2.0.)
>>>>
>>>> ~Chris
>>>> - --
>>>> Chris Maden, text nerd<URL: http://crism.maden.org/>
>>>> LIVE FREE: vote for Gary Johnson, Libertarian for President.
>>>>     <URL: http://garyjohnson2012.com/>    <URL: http://lp.org/>
>>>> GnuPG fingerprint: DB08 CF6C 2583 7F55 3BE9  A210 4A51 DBAC 5C5C 3D5E
>>>>
>>>>
>>>> -----BEGIN PGP SIGNATURE-----
>>>> Version: GnuPG v1.4.10 (GNU/Linux)
>>>> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
>>>>
>>>> iQEcBAEBAgAGBQJP5oCxAAoJEEpR26xcXD1eHSUH/0E0F49MPJJJ1j/1lB9Zw0zK
>>>> gNBxalYi/zVpHCgSYNzdXYrdvYWZFIDkQng4opPXBLA5nbWvaJ4qpObrMbB80cmN
>>>> unUmPhrb5IkuYx1adgCvNzxlRuabdG06jUUbO11kq8HPbyWH74tEsFP5+IPrTOpn
>>>> /xmZTkR5Z0kO93yl6osUbyeq42dF34HmyQKVwWQD0dXHVM8q5BUbVesnxmjdGoE9
>>>> 7zZTJH+r3K0WhGbM0Iq91wZ4LF3qTT25gih+TBF3cMAzsBCGaxzzFlRoJj0qDVj2
>>>> q6DW/awQW+JU8VxRavaoQG1rk1No/k/GkStSv+UXCBdl3qwdwbVIXWdXaliZ0/o=
>>>> =YGiD
>>>> -----END PGP SIGNATURE-----


Current Thread