[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] When to use text()


Subject: Re: [xsl] When to use text()
From: "Abel Braaksma (Exselt)" <abel@xxxxxxxxxx>
Date: Sat, 22 Mar 2014 15:48:27 +0100

Interesting thoughts.

When designing a language, there will always be a lot of discussion
about the choice of words for keywords, terminology, language
constructs. Take C#, they used the word "assembly" for physically
separated packages, and the word "namespace" for logical separations. To
this day, many (starting) programmers have a hard time understanding
those concepts, not in the last place because "assembly" reminds them of
assembly language and "namespace" about XML namespaces. Similarly, why
did they choose the keyword "fixed" when the meaning is to "pin" a variable?

Those discussions will never end, and should never end. It will always
remind language designers to think carefully about the words they choose.

In this particular case, the working group at the time had a conflict of
interest. There was XML, which was already defined, which had text
nodes. And there was XPath (not XSLT) that required a method for
selecting those text nodes. Since they were already called text nodes in
DOM [1], it made sense to follow this nomenclature. Note that, in the
XML Infoset, they did not exist, nor in the original XML specifications.
Instead, they were called character information items[2], which referred
to the individual characters, not the whole node.

On the other hand they had a requirement to be able to atomize nodes, in
other words, to turn them into what is commonly known in computing as a
"string". There are languages that use the keyword TEXT when referring
to strings, but many common languages use the keyword string.

What were they to do? Are there other alternatives? Text nodes needed a
name and atomized text nodes too. Both were an important requirement,
because if you would always atomize, then how can you query mixed content?

An important distinction is that text() is a a KindTest (it tests
whether a given node is a text node, as such, it in fact returns a
boolean), and string() and string(x) are functions that take an implicit
or explicit argument and turn it into a string.

One might argue that you could use is-text() and is-comment(), and
conversely convert-to-string and the like But that doesn't work well in
an expression as para/em/is-text() or even para/em[is-text()], because
the semantics here are not "is" but "has" (select all the nodes that
have an "em" parent, or select all the em-nodes that have one or more
text children). And my argument against convert-to-string would be that
it is annoyingly long, but that's just me. My argument against string()
itself is that it looks too much like a constructor function, which it
is not.

I'm not saying that the choice of words is perfect, but I wanted to
point out that the choice of words is never an easy one. W3C standards
are created by consensus of all the members and it is an open process
where non-members can submit bug reports to draft standards and the
working group is required to look into them. If you have a strong
argument, they are likely to take your argument seriously.

That said, I invite you and everyone on this list or elsewhere to look
at the current XSLT 3.0 Last Call Working Draft[3]. Even now there are
still some open bugs on choices of terms and keywords. It is still open
for bug-reports from anyone, which you can file into W3C's bugzilla[4]
(signing up is easy).

Small disclaimer: I was not a member of the WG at the time they needed
to make a choice for the string() function and text() kindtest, so the
road to consensus I laid out above may not be the actual road that lead
to consensus.

Cheers,

Abel Braaksma
Exselt XSLT 3.0 processor
http://exselt.net

PS: you don't need to look up the spec to remind you of text() vs
string(), in fact, just about any book on XSLT clearly explains their
semantics and pitfalls. And you are right, people starting out with a
language will start with a tutorial book, and that is exactly where they
learn this distinction.

[1] http://www.w3.org/TR/REC-DOM-Level-1/level-one-core.html
[2] http://www.w3.org/TR/xml-infoset/#infoitem.character
[3] http://www.w3.org/TR/xslt-30/
[4] https://www.w3.org/Bugs/Public/


On 21-3-2014 17:00, Ihe Onwuka wrote:
> On Fri, Mar 21, 2014 at 3:39 PM, Eliot Kimber <ekimber@xxxxxxxxxxxx> wrote:
>> What is the alternative? Invent new terms for all concepts for which a
>> common term would be appropriate?
>>
>> It is simply the case that in all technical standards there will be jargon
>> uses of common terms. It is not reasonable or realistic to expect
>> otherwise. It is not realistic or reasonable to expect to not have to look
>> things up to learn or be reminded of the specific meaning of something in
>> a standard.
>>
> As to what is reasonable,  my starting benchmark would be how many
> programmers have ever read a language specification of any sort.
>
> I would factor into the equation the fact that there are often several
> layers of online tutorial, textbook, programmers refernce/nutshell
> books separating the programmer from the programming specification and
> wonder how many of those would include this sort of factlet.


Current Thread
Keywords