XPath queries on imported HTML doesn't work?
Questions about XML that are not covered by the other forums should go here.
-
- Posts: 2
- Joined: Sun Jan 23, 2005 4:06 pm
XPath queries on imported HTML doesn't work?
Hi there,
On a newly installed oxygen XML 5.1 on a PowerBook I try the following:
1) Import some HTML: "File -> Import HTML...", enter "http://www.google.com" and select "XHTML 1.0 Transitional". This will load the Google homepage and transform it to XHTML. The resulting document is a well-formed XHTML document. (But it does not quite validate - it complains about some attributes, but never mind...)
2) Using the XPath 1.0 text box on the top right of the editor window, try to find some portion of the document.
My problem is, no matter what expression I search with, the result is always a popup reporting that "The XPath query returned no results". Even if I search for elements that obviously are in the document, such as "//html", "//body", "//table", etc.
My question is, why don't the XPath queries find anything?
Regards,
panter
On a newly installed oxygen XML 5.1 on a PowerBook I try the following:
1) Import some HTML: "File -> Import HTML...", enter "http://www.google.com" and select "XHTML 1.0 Transitional". This will load the Google homepage and transform it to XHTML. The resulting document is a well-formed XHTML document. (But it does not quite validate - it complains about some attributes, but never mind...)
2) Using the XPath 1.0 text box on the top right of the editor window, try to find some portion of the document.
My problem is, no matter what expression I search with, the result is always a popup reporting that "The XPath query returned no results". Even if I search for elements that obviously are in the document, such as "//html", "//body", "//table", etc.
My question is, why don't the XPath queries find anything?
Regards,
panter
-
- Site Admin
- Posts: 2095
- Joined: Thu Jan 09, 2003 2:58 pm
Hi,
This is because XPath expressions do not have a notion of default namespace. This means that when you write //table that expressions looks for the table element from no namespace while in your document you have table elements but they are all in the http://www.w3.org/1999/xhtml namespace. How is that so? Because the XHTML DTD contains a fixed attribute xmlns with that value. I admit that this is difficult to see, here it is the fragment from the DTD:
http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd
oXygen determines the default namespace and maps it to the first available prefix from {default, default1, default2, etc.}.
In your case if you will use //default:table you should get all the table elements from the document.
Also if you remove the DTD then you will get all the elements in no namespace and you should be able to use //table.
Best Regards,
George
This is because XPath expressions do not have a notion of default namespace. This means that when you write //table that expressions looks for the table element from no namespace while in your document you have table elements but they are all in the http://www.w3.org/1999/xhtml namespace. How is that so? Because the XHTML DTD contains a fixed attribute xmlns with that value. I admit that this is difficult to see, here it is the fragment from the DTD:
http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd
Code: Select all
<!ELEMENT html (head, body)>
<!ATTLIST html
%i18n;
id ID #IMPLIED
xmlns %URI; #FIXED 'http://www.w3.org/1999/xhtml'
>
In your case if you will use //default:table you should get all the table elements from the document.
Also if you remove the DTD then you will get all the elements in no namespace and you should be able to use //table.
Best Regards,
George
-
- Posts: 2
- Joined: Sun Jan 23, 2005 4:06 pm
Hi George,
Thanks for the swift reply. I hadn't considered the namespace issues...
You're right, using "//default:XXX" allows me to find elements with name XXX in the document.
However, if I remove the DTD, as you suggested (the "<!DOCTYPE ...>" declaration) and then try to search for "//body" I get an error. It complains that my XPath expression is invalid (???) because an entity ("nbsp" in my case) is referenced in the document but not declared. What's the reason for this odd message? Why does my XPath expression become invalid when there's an undeclared entity in the document?
Thanks for the swift reply. I hadn't considered the namespace issues...
You're right, using "//default:XXX" allows me to find elements with name XXX in the document.
However, if I remove the DTD, as you suggested (the "<!DOCTYPE ...>" declaration) and then try to search for "//body" I get an error. It complains that my XPath expression is invalid (???) because an entity ("nbsp" in my case) is referenced in the document but not declared. What's the reason for this odd message? Why does my XPath expression become invalid when there's an undeclared entity in the document?
-
- Site Admin
- Posts: 2095
- Joined: Thu Jan 09, 2003 2:58 pm
Hi,
If you have a document that is not wellformed than that document is not XML. If you have undeclared entities then the document is not well formed. XPath can be applied only on XML documents, that is documents that pass the wellformedness check.
I agree that the error message starts with a wrong wording, it is not the XPath expression that is invalid but the document. The rest of the message clarifies that. Anyway I filed a bugzilla entry with this so we can give a better message in such cases when the problem is in the document.
Best Regards,
George
If you have a document that is not wellformed than that document is not XML. If you have undeclared entities then the document is not well formed. XPath can be applied only on XML documents, that is documents that pass the wellformedness check.
I agree that the error message starts with a wrong wording, it is not the XPath expression that is invalid but the document. The rest of the message clarifies that. Anyway I filed a bugzilla entry with this so we can give a better message in such cases when the problem is in the document.
Best Regards,
George
Return to “General XML Questions”
Jump to
- Oxygen XML Editor/Author/Developer
- ↳ Feature Request
- ↳ Common Problems
- ↳ DITA (Editing and Publishing DITA Content)
- ↳ Artificial Intelligence (AI Positron Assistant add-on)
- ↳ SDK-API, Frameworks - Document Types
- ↳ DocBook
- ↳ TEI
- ↳ XHTML
- ↳ Other Issues
- Oxygen XML Web Author
- ↳ Feature Request
- ↳ Common Problems
- Oxygen Content Fusion
- ↳ Feature Request
- ↳ Common Problems
- Oxygen JSON Editor
- ↳ Feature Request
- ↳ Common Problems
- Oxygen PDF Chemistry
- ↳ Feature Request
- ↳ Common Problems
- Oxygen Feedback
- ↳ Feature Request
- ↳ Common Problems
- Oxygen XML WebHelp
- ↳ Feature Request
- ↳ Common Problems
- XML
- ↳ General XML Questions
- ↳ XSLT and FOP
- ↳ XML Schemas
- ↳ XQuery
- NVDL
- ↳ General NVDL Issues
- ↳ oNVDL Related Issues
- XML Services Market
- ↳ Offer a Service