[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] Modern web site design with XML and XSLT


Subject: Re: [xsl] Modern web site design with XML and XSLT
From: "Eric J. Bowman" <eric@xxxxxxxxxxxxxxxx>
Date: Mon, 4 Jan 2010 10:30:16 -0700

Rob Belics wrote:
> 
> It just hadn't occured to me that I could test for the
> search engine bot and do the transformation on the server side for
> them and any browser that can't do its own transformations.
>

Don't do this!  Google says not to, as they have no way of knowing if
you're doing this to custom-tailor spider-food for their bot, that
differs from what humans see, so they just assume you're a "black hat."
Use HTTP Content Negotiation, in such a way that your XHTML 1.0 as
text/html representation gets served as the default.  That's the topic
of this post.

An important note, before continuing:  Browsers today are (mostly) XSLT
1.0 compatible, with EXSLT being hit-and-miss.  So writing cross-browser
XSLT is every bit as tedious and mind-numbing as writing cross-browser
CSS, as the implementations are all different and bizarre.  All IE
browsers from 6 up have MSXSLT, but you have to call it from Javascript
since there's no application/xhtml+xml support in IE (and not likely to
arrive before IE 13 at this rate...).

However, I've found that the XSLT you arrive at will most likely work
on more than one server-side transformation library.  You can't develop
in, say, Xalan or Saxon, and expect your results to work cross-browser.

>
> I'm also reading about Google getting more involved with XML
> but I can't see them wanting to devote processor time for xslt.
> Perhaps they'll start taking in XML as something other than plain
> data.
>

I don't follow.  Chrome has had application/xhtml+xml and XSLT support
since the beginning, they've been part of WebKit since before Apple.
Since XSLT is something that takes up processor time in Chrome, I assume
Google will tackle it sooner or later, hopefully in a way Safari will
adopt.  It took a while for Gecko and Opera to get XSLT going, document
() is just an HTTP GET after all, so I never could fathom why it was
such a stumbling block for browsers.

There are two ways to tackle the problem here -- one is to base conneg
on the User-Agent string and make a table of exactly which versions of
what browsers can handle our cross-browser XSLT code using document().
The other is to be thankful that Apple WebKit corrected its HTTP
request headers several months ago, such that Accept: now includes
application/xhtml+xml, and just switch on that.

I'm currently redeveloping my conneg algorithm's support for client-side
XSLT.  I've decided my rule is, all browsers that support application/
xhtml+xml support XSLT 1.0 (+ a coupla EXSLT functions).  Older
versions of browsers like Opera and Firefox, or K-meleon, that Accept:
application/xhtml+xml will see a link to the XHTML 1.0 variant as
text/html, with an "upgrade your browser" message (even if their
browser does support XSLT, in some cases).  Some older browsers will
just break (Opera > 9.0 and < 9.5).

In a year or two though, the reality will be 99% of real-world
application/xhtml+xml clients will be XSLT 1.0 capable, as older
versions of Apple WebKit and Opera fade into the background.  So the
content negotiation algorithm starts by checking for the presence of
either application/xhtml+xml or application/xslt+xml in the client's
Accept header.  (May as well account for XSLT 2 clients, as they'll be
backwards-compatible with the XSLT 1 code, even though an XSLT 2 browser
is hypothetical at this point.)

These clients receive a valid XHTML 1.1 representation, with the <head>
content generated by server-side XSLT and a <body> consisting of the
browser-upgrade message and a link (with rel='alternate').  This
representation includes an XML PI which calls the client-side XSLT code.
The client-side XSLT code transforms an Atom source document into the
proper <body> content for the request URL.  The CSS is linked in the
<head>, if it's cached the XSLT output comes out styled without any
flashing, rendering in a single pass if the markup is correct.

(The XHTML 1.1 representation is sent with Vary: Accept and, of course,
a proper Content-Location header.)

If the Accept header doesn't include application/xhtml+xml or
application/xslt+xml, then the User-Agent header is parsed to determine
if the browser is IE 6+.  If so, server-side XSLT generates a valid
XHTML 1.0 representation (served as text/html) along with the document
<head> -- this requires two separate XSLT stylesheets, one for XHTML
1.1 and the other for XHTML 1.0, but each can include the same XSLT
file for generating most of the <head>.  The <body> contains <noscript>
content alerting the user to enable Javascript, or follow the link with
rel='alternate'.

(The XHTML 1.0 representation for IE-only is sent with Vary: User-Agent
and, of course, a proper Content-Location header.)

So we require a client-side XSLT transformation that is capable of
being called either via XML PI or via Javascript.  We'll also need
a conditional comment in the <head>, so only IE calls the XSLT script.
If IE isn't detected in the User-Agent header, a full XHTML 1.0
representation (as text/html) is generated using server-side XSLT.
This is the same stylesheet used for the IE variant, except it
suppresses the script link in the output.  It calls the client-side
XSLT code (from the server, of course) to generate the <body>.

(The full XHTML 1.0 representation for all other (non-Atom) clients is
sent with Vary: Accept and, of course, a proper Content-Location header.
Googlebot will receive this representation, and the Vary header will
tell it that its User-Agent string was not part of the negotiation.)

That's three different scenarios, addressed using four different XSLT
stylesheets, with cross-browser support and without breaking the "back"
button in clients.  This modular (not to be confused with mode-ular)
approach allows HTML 5 to be developed as its own set of XSLT
stylesheets.  Non-IE browsers that support XSLT and HTML 5 goodies ought
to be compatible with HTML 5's XML serialization, long before IE
implements HTML 5, so using application/xhtml+xml as a switch should
allow early adoption of HTML 5 without having to wait for IE to catch
up.

I hope this puts the kaibosh on the notion that this approach to Web
Development will be some sort of mess five years out, maintenance-wise.
the client-side XSLT code is exactly the same as the server-side XSLT
code, for transforming Atom to XHTML.  The only complexities are that
this code must be called from three different contexts -- client via
XML PI, client via Javascript, and server via XML PI -- and that both
XHTML 1.0 and 1.1 are supported, but have different syntax for the
<head>.

A website built this way is very easy to re-style.  Granted, most of
the styling is done via CSS, but changing document structure a bit or
altering @class/@id become edits to the existing XSLT code.  The site
structure and operation don't need altering.  The new style is rolled
out by changing the XML PI links (and the Javascript link for IE).
Very easy to roll back, too, since only the styling changes -- not the
links.

Vary: User-Agent makes caching a real nightmare, unlike Vary: Accept.
But, especially if you're compressing content, caching in IE is a mess.
So limiting Vary: User-Agent to IE, which can't cache for a damn (maybe
they'll get it right in IE 9) anyway, is a no-harm no-foul drawback.
IE does horribly bad at Accept headers too, so not being able to just
use Vary: Accept for IE has no real-world caching disadvantage either.
Other browsers, OTOH...

> 
> I was also getting discouraged with the lack of support in the browser
> but I get the inkling XML support may get better as the years go on. 
> 

Microsoft's failure to implement application/xhtml+xml (and SVG) has
done more to stifle innovation on the Web than anything.  Despite this,
XSLT is now in all the other major browsers.  If implementation were as
simple as switching on the presence of application/xhtml+xml in an
Accept header (instead of relying on User-Agent or restricting the
benefits of client-side XSLT to non-IE browsers), we'd see this method
of building applications gain enough widespread use that the browser
vendors would compete on improving XSLT performance in the same way
they are now obsessed with Javascript performance.

(Michael Kay has certainly laid the groundwork, with the performance
work he's done on his own project of late.)

While decent in-browser XSLT performance and interoperability are a ways
off, client-side XSLT is a here-today technology with benefits for both
users and site owners -- provided you're willing to deal with HTTP
content negotiation, and cross-browser issues that make CSS look easy
to develop by comparison (CSS quirks being well-documented).

-Eric


Current Thread
Keywords