Bug in XPath 2.0 regular expression search
Having trouble installing Oxygen? Got a bug to report? Post it all here.
-
- Posts: 125
- Joined: Mon Jun 09, 2003 6:02 pm
- Location: Charlottesville, Virginia USA
Bug in XPath 2.0 regular expression search
Consider this XML document:
I want to match the text node containing ASCII 'A' using an XPath 2.0 expression in the oXygen 6.1 XPath search box. This works:
but this returns 0 results:
even though it is semantically identical according to XPath 2 / XML schema regular expression rules.
Instead, the second search in oXygen matches this:
i.e. a string with an ampersand character reference followed by literal "#65;". It seems that oXygen is not correctly parsing the character reference before doing the regular expression match.
I don't know if oXygen passes an XPath 2.0 search to the Saxon 8 engine, but if so, the problem is not with Saxon 8, because it performs correctly given this XQuery using the same matches() call:
Code: Select all
<?xml version="1.0" encoding="UTF-8"?>
<test>
<string>A dog</string>
<string>a cat</string>
</test>
Code: Select all
//text()[matches(., "A")]
Code: Select all
//text()[matches(., "A")]
Instead, the second search in oXygen matches this:
Code: Select all
<string>A frog</string>
I don't know if oXygen passes an XPath 2.0 search to the Saxon 8 engine, but if so, the problem is not with Saxon 8, because it performs correctly given this XQuery using the same matches() call:
Code: Select all
let $xml :=
<test>
<string>A dog</string>
<string>a cat</string>
</test>
for $n in $xml//string[matches(., "A")]
return $n
(: returns <string>A dog</string> :)
-
- Site Admin
- Posts: 2095
- Joined: Thu Jan 09, 2003 2:58 pm
-
- Posts: 125
- Joined: Mon Jun 09, 2003 6:02 pm
- Location: Charlottesville, Virginia USA
XPath syntax
Well, I think that technically according to the XPath specification (even for XPath 1.0) a string is a sequence of characters as defined in the XML specification (see http://www.w3.org/TR/xpath#strings), so that in fact an XPath parser should treat these as identical:
So the current oXygen XPath search not fully implementing the XPath string model, as I understand it.
For my personal work it's not a big issue because I can almost always directly input a UTF-8 character. But I discovered this bug when I was documenting a procedure for general use. Specifically, I was sharing a method in oXygen for searching for Unicode en dash (—). It is preferable to use a numeric character reference like "contains($string, '—')" because it is too easy to confuse the en-dash character with a hyphen. So I do think it would be worth adding support for character entity references in the search field.
Code: Select all
contains("CAT", "A") = contains("CAT", "A")
contains("dog's", "'") = contains("dog's", "'")
For my personal work it's not a big issue because I can almost always directly input a UTF-8 character. But I discovered this bug when I was documenting a procedure for general use. Specifically, I was sharing a method in oXygen for searching for Unicode en dash (—). It is preferable to use a numeric character reference like "contains($string, '—')" because it is too easy to confuse the en-dash character with a hyphen. So I do think it would be worth adding support for character entity references in the search field.
-
- Site Admin
- Posts: 2095
- Joined: Thu Jan 09, 2003 2:58 pm
Hi David,
Not really, the XPath is in general placed in an attribute value and it is the XML Parser that decodes the entity. But we will consider anyway adding an option that when enabled will produce decoding of the standard XML entities <, >, ', " and & and of the character entities from the XPath entry field.
Best Regards,
George
Not really, the XPath is in general placed in an attribute value and it is the XML Parser that decodes the entity. But we will consider anyway adding an option that when enabled will produce decoding of the standard XML entities <, >, ', " and & and of the character entities from the XPath entry field.
Best Regards,
George
-
- Posts: 125
- Joined: Mon Jun 09, 2003 6:02 pm
- Location: Charlottesville, Virginia USA
Jump to
- Oxygen XML Editor/Author/Developer
- ↳ Feature Request
- ↳ Common Problems
- ↳ DITA (Editing and Publishing DITA Content)
- ↳ SDK-API, Frameworks - Document Types
- ↳ DocBook
- ↳ TEI
- ↳ XHTML
- ↳ Other Issues
- Oxygen XML Web Author
- ↳ Feature Request
- ↳ Common Problems
- Oxygen Content Fusion
- ↳ Feature Request
- ↳ Common Problems
- Oxygen JSON Editor
- ↳ Feature Request
- ↳ Common Problems
- Oxygen PDF Chemistry
- ↳ Feature Request
- ↳ Common Problems
- Oxygen Feedback
- ↳ Feature Request
- ↳ Common Problems
- Oxygen XML WebHelp
- ↳ Feature Request
- ↳ Common Problems
- XML
- ↳ General XML Questions
- ↳ XSLT and FOP
- ↳ XML Schemas
- ↳ XQuery
- NVDL
- ↳ General NVDL Issues
- ↳ oNVDL Related Issues
- XML Services Market
- ↳ Offer a Service