[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

RE: [xsl] generating numerical character entities in html output

Subject: RE: [xsl] generating numerical character entities in html output
From: Rossella Rosin <tanataviele@xxxxxxxxxxx>
Date: Tue, 18 Jan 2005 12:44:10 +0100

XSLT 2.0 allows you to control this aspect of serialization with character
maps, but it's not straightforward, because character maps apply to all
characters in the output.

I'll try to learn everything about it :-)

You don't want to disable escaping of characters that would normally be
escaped, you want to enable escaping of characters that wouldn't normally be

yes, I thought so in fact, but there is no force-escaping attribute ;-)

If you want to write character references using d-o-e, then you can do it

<xsl:text disable-output-escaping="yes">&amp;#64;</xsl:text>

provided the processor supports d-o-e. Here you are outputting five
characters & # 6 4 ;. The first character, &, would normally be escaped as
&amp;, and d-o-e suppresses this.

well, I'm not even sure it _should_ do this. I'm currently dealing with msxml (both 3 and 4) and I've tried what you suggest before sending the help request.
<xsl:text disable-output-escaping="yes">&amp;</xsl:text> produces "&amp;" so it's no surprise (and very consistent) that
<xsl:text disable-output-escaping="yes">&amp;#64;</xsl:text>
actually produces "&amp;#64;"

I will try with other xsl processors but I don't really expect any difference in behavior.

However: are you sure what you are doing makes sense? No one reading an HTML
document is supposed to make any distinction between @ and &#64; and I would
have thought this included spammers.

I would have thought the same. But spammers don't seem to "read" HTML files. They use their own robots, that don't actually need to be full-fledged standard-compliant web browsers - they just need to know enough http to be able to get an html source and look for anything that resembles an e-mail address. I suppose they just use regular expressions or something even rougher for that. Being aware of numerical character entities seems not to have been worth the money needed to upgrade their code - at least, at the time of this writing ;-)
Of course, it won't take long before people start to use the same trick to conceal addresses, and spammers will upgrade their software.
I am aware that I will not keep spammers away forever - I'm just trying to make it as difficult as possible for them to get at the address.
Some people don't publish email addresses at all, which is of course the safest course, but sometimes people do want their email published. Using images to display the address is another relatively safe way to deal with the problem. But I would like to make my web sites as accessible as possible and having an image with an inconsistent alt attribute would make it impossible. So here I am ;-)

Michael Kay


Current Thread