[oXygen-user] Xpath and Saxon return tabs as text
Wendell Piez
Tue Sep 9 09:12:36 CDT 2008
Hi,
In the meantime, Philip should be aware that
there is generally only a loose binding between a
schema (or DTD) and a document, such that (other
things being equal) processors will not
automatically strip whitespace-only text nodes
from documents without explicit instruction to do
so. This is by design, since schemas are not
always available to processors, and indeed some
operations can and should be able to run without
schemas. Whitespace stripping without a schema is
dangerous and can frequently result in corrupt
data where whitespace was stripped improperly.
Accordingly, although the XPath 2.0/XQuery family
of technologies provides this feature, Philip may
have to get used to its not always being
available, for example when using XPath 1.0.
In general, it's something to watch out for;
automatic whitespace stripping can easily fall
into the category of "be careful what you wish for".
Cheers,
Wendell
At 11:23 AM 9/3/2008, Sorin wrote:
>Hello,
>
>Saxon 9 has an option for stripping whitespace
>nodes but Oxygen allows you to set it only for
>transformations (Preferences -> XML ->
>XSLT-FO-XQuery -> XSLT -> Saxon -> Saxon-B/SA).
>If you set the above option to strip whitespace
>nodes and you run an XSLT transform that uses
>the expression //text() you can see that the
>list of nodes does not contain such nodes. In
>the next version we will add this Saxon 9 option for XPath expressions too.
...
>Philipp Steinkrüger wrote:
>>Dear Oxygen-Users,
>>i am having a problem with an indented XML File. The File looks like this:
>><?xml version="1.0" encoding="UTF-8"?>
>><TEI
>>xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>>xmlns="http://www.i-d-e.de/ns/1.0">
>> <teiHeader>
>> <fileDesc>
>> <titleStmt>
>> <title>MS Einsiedeln</title>
>> </titleStmt>
>> <publicationStmt>
>> <p>publicationsStmt empty</p>
>> </publicationStmt><sourceDesc>
>> <p>sourceDesc empty</p>
>> </sourceDesc></fileDesc>
>> </teiHeader>
>> <text>
>> <body>
>> <div>
>> <div>
>> <div>
>> <p><c>D</c>ie gotheit
>> it beloſen<lb/>in dem vater n<ex>atur</ex>elich
>> dar<lb/>vmbe
>> it er alvermvgende<lb/>vnd enpfat niht von ite<lb
>> />des<gap
>> reason=""/> er elber nit en it an<lb/>iner go<unclear
> > >tl</unclear>icher macht wan<lb/>ers
>>weelich i<ex>n</ex> ime vnd
>> an<lb/>ime
>> elben beloſen hat<space unit="letters" quantity="1"
>> /></p>
>> </div>
>> </div>
>> </div>
>> </body>
>> </text>
>></TEI>
>>Now, using the following XPath 2.0 expression:
>>//text(), the tabs are returned as text-nodes,
>>for example the first tab before the tag
>><teiHeader>. In fact, my DTD does not allow
>>#PCDATA inside <TEI>, but the document is
>>validated without any problems. To me this
>>seems kind of schizophrenic, or am I mistaken?
>>Btw: the same file in XMLSpy with its build-in
>>xslt engine as well as MS XML parser with the
>>same xPath expression does not return the tabs as text-nodes.
>>Any ideas?
>>Philipp
>>PS: I am using Oxygen 9.3
======================================================================
Wendell Piez mailto:
Mulberry Technologies, Inc. http://www.mulberrytech.com
17 West Jefferson Street Direct Phone: 301/315-9635
Suite 207 Phone: 301/315-9631
Rockville, MD 20850 Fax: 301/315-8285
----------------------------------------------------------------------
Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================
More information about the oXygen-user
mailing list