Page 1 of 1

Unresolved Entity not detected or not reported

Posted: Mon May 12, 2014 9:35 pm
by fsteimke
Hi,

i have a very simple xml document which is valid with respect to a trivial schema. The Schema in RELAX NG compact Syntax defines an element p of type text. An Entity is defined in the internal subset and expanded as expected. When i try to referenceanother entity which is not declared, an error is reported as expected.

Code: Select all

<?xml version="1.0" encoding="UTF-8"?>
<?xml-model href="a.rnc" type="application/relax-ng-compact-syntax"?>
<!DOCTYPE p [
<!ENTITY t "test" >
]>
<p>This is a &t; and another &t1;</p>
Everything fine up to now. But when i have some parameter entities with plenty of entities, the same document is validated without any errors, although the second entity reference is still not declared.

Code: Select all

<?xml version="1.0" encoding="UTF-8"?>
<?xml-model href="a.rnc" type="application/relax-ng-compact-syntax"?>
<!DOCTYPE p [
<!ENTITY % isolat1 PUBLIC "ISO 8879:1986//ENTITIES Added Latin 1//EN" "http://www.w3.org/2003/entities/iso8879/isolat1.ent">
<!ENTITY % isolat2 PUBLIC "ISO 8879:1986//ENTITIES Added Latin 2//EN" "http://www.w3.org/2003/entities/iso8879/isolat2.ent">
<!ENTITY % isopub PUBLIC "ISO 8879:1986//ENTITIES Publishing//EN" "http://www.w3.org/2003/entities/iso8879/isopub.ent">
%isolat1;
%isolat2;
%isopub;

<!ENTITY t "test" >
]>
<p>This is a &t; and another &t1;</p>
I am sure that there is no entity called t1 declared in any of the parameter entity files. Looking from the author view gives the expanded value "test" for &t; and nothing for &t1;

Conclusion: the fact that there is an undeclared entity is not detected or detected but not reported.

Is this a bug or did i miss somethin in the configuration?

Environment: Windows 7, Oxygen 15.2.

Sincerely,
Frank

Re: Unresolved Entity not detected or not reported

Posted: Tue May 13, 2014 12:59 pm
by Radu
Hi Frank,

Indeed in the case in which you have an entity reference and external parameter entities the XML parser used by Oxygen (the open-source Apache Xerces library) does not report undefined entities.

The XML Specs:

http://www.w3.org/TR/REC-xml/#include-if-valid
If the entity is external, and the processor is not attempting to validate the XML document, the processor MAY, but need not, include the entity's replacement text.
states that the XML processor may include the entity's replacement text if the entity is external.
Basically in this case the XML parser does not find a declaration for the entity and thus assumes it to be externally defined in the DTDs.
If the entity would be declared the XML parser bundled with Oxygen would include its text. But in this case the parser finds no declaration for that specific entity and based on the specification which does not force it even to expand the entity, issues no warning/error.

Long story short, Oxygen would help you more if it would report a warning in this case so we'll consider altering/patching the XML parser we use in order to report such issues as warnings in a future version. I will update this forum post when this happens.

Regards,
Radu

Re: Unresolved Entity not detected or not reported

Posted: Tue May 13, 2014 4:24 pm
by fsteimke
Hi Radu,
Thanks a lot for this explanation. Yes, a warning woud be great in this case.

Sincerely,
Frank

Re: Unresolved Entity not detected or not reported

Posted: Wed Apr 01, 2015 9:24 am
by Radu
Hi,

Just to update this thread, Oxygen 16.1 should issue a warning for this situation.
We also added an issue with an improvement suggestion to the Xerces parser issues list:

https://issues.apache.org/jira/browse/XERCESJ-1635

Regards,
Radu

Re: Unresolved Entity not detected or not reported

Posted: Wed Sep 20, 2017 8:13 am
by dcramer
Feature Request:

I would like to be able to specify for a given validation scenario the severity of this situation. I understand (and quote below) the spec's logic for leaving room for applications to recover from this situation (see the quote below). However, as a tool chain maintainer, it might be important for me that authors do not have undefined entities, even when they happen to be including external parameter entities in their file.

In my build pipeline, I'll break the build in that situation and would like to have the ability to let my users know that this will be a fatal error, not just a warning. The current situation confuses them and makes me seem like a jerk for refusing to give them output even though Oxygen told them "Document is valid".

http://www.w3.org/TR/REC-xml/#include-if-valid
This rule is based on the recognition that the automatic inclusion provided by the SGML and XML entity mechanism, primarily designed to support modularity in authoring, is not necessarily appropriate for other applications, in particular document browsing. Browsers, for example, when encountering an external parsed entity reference, might choose to provide a visual indication of the entity's presence and retrieve it for display only on demand.

Re: Unresolved Entity not detected or not reported

Posted: Wed Sep 20, 2017 8:36 am
by dcramer
I also have a legit bug to report related to this. There's a situation where Oxygen fails to give even a warning about an undeclared entity.

See the section "Bonus bug in Oxygen 19.0" in https://github.com/dwcramer/xmlcalabash-catalog-bug

In the screen cap below, notice that Oxygen fails to report the undeclared entity &foo; in line 17, though it does notice &bar; in line 18!?! Clone the repo and follow the instructions in the readme to see it in action:

Image

Re: Unresolved Entity not detected or not reported

Posted: Wed Sep 20, 2017 9:51 am
by Radu
Hi David,

For your posts above, setting standalone="yes" to the XML declaration:

Code: Select all

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
will trigger that error also for attribute values using entities.
I think that what the Xerces parser does when standalone="no" (ignore the entity reference in the attribute declaration) is correct according to the specs:

https://www.w3.org/TR/REC-xml/#sec-rmd

Regards,
Radu

Re: Unresolved Entity not detected or not reported

Posted: Wed Sep 20, 2017 9:26 pm
by dcramer
Thanks Radu.

Arcane stuff! I'm still puzzled by two things:

1. Why does role="&foo;" completely escape Xerces/Oxygen's notice without standalone="yes"? Is this interpreted as being a default value for the attribute? If so, that seems to be a strange interpretation. In any case, I can't imagine a situation where this would be what I want or expect.

2. When I process the same file with Calabash (running from the command line and using the Apache resolver to avoid another bug in xmlresolver), it blows up on any undeclared entity no matter what combination of parameter entities or standalone declaration, whether the undeclared entity is in an attribute value or not. I assume Calabash is ultimately using Xerces as well, though perhaps configured or called differently.

Code: Select all

ERROR: file:/Users/dcramer/Downloads/Top-xyz/catalog-bug/doc/oxygen-gets-confused.xml:17:25:err:SXXP0003:Error reported by XML parser
ERROR: cause: The entity "foo" was referenced, but not declared.
ERROR: It is a dynamic error if the resource referenced by a p:document element does not exist, cannot be accessed, or is not a well-formed XML document.
ERROR: Underlying exception: org.xml.sax.SAXParseException; systemId: file:/Users/dcramer/Downloads/Top-xyz/catalog-bug/doc/oxygen-gets-confused.xml; lineNumber: 17; columnNumber: 25; The entity "foo" was referenced, but not declared.

Re: Unresolved Entity not detected or not reported

Posted: Thu Sep 21, 2017 10:06 am
by Radu
Hi David,

I looked more into this.
Thing is that an equivalent Docbook 4 document with about the same content:

Code: Select all

<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
"http://www.docbook.org/xml/4.5/docbookx.dtd"[
<!ENTITY % entityset SYSTEM "urn:entityset">
%entityset;
<!ENTITY % myents SYSTEM "file://docshared/entities/rewrite-this.ent" >
%myents;
]>
<book >
<title>Catalog Test</title>
<!--<xi:include href="file://docshared/content/content.xml"/>-->
<chapter role="&foo;">
<title>Plain old chapter def</title>
<para> Testing some entities: <itemizedlist>
<listitem>
<para>Entity resolved using a simple uri rewrite rule:</para>
</listitem>
<listitem>
<para>Entity resolved using a rewriteUri rule:</para>
</listitem>
</itemizedlist></para>
</chapter>
</book>
reports that undeclared entity reference.

The difference is that you are using a Docbook 5 document.
Docbook 5 documents are Relax NG based and are validated using the Jing engine. The Jing engine uses the Xerces processor to parse each XML document but it does not need Xerces's validation capabilities, because it wants to use Xerces as an XML parser and not an XML validator (the actual validation being done by Jing against a Relax NG schema). So in Oxygen when the Jing engine requests a Xerces processor, it will use one which has validation turned off:

https://xerces.apache.org/xerces2-j/fea ... validation

meaning that it also may not report certain entity references which are not declared. But setting the XML documents as standalone forces the Xerces parser to also report such problems even though validation is turned off.

Regards,
Radu

Re: Unresolved Entity not detected or not reported

Posted: Thu Sep 21, 2017 6:31 pm
by dcramer
Hi Radu,

Thanks for that explanation.

We've experimented with adding standalone="yes". With that added to the xml declaration, Oxygen does indeed start reporting all undeclared entities, even those in attribute values, as fatal errors. However, it also reports other things as errors that I don't think should be. Unfortunately, this would cause more problems than it solves for us. I've committed some changes to my example document in that github repo to illustrate: https://github.com/dwcramer/xmlcalabash ... d966b3ceb6

Declaring and referencing the entity uri-test in a parameter entity file is fine. But if I declare and reference another parameter entity within that entity file, like I do in entities.ent, then Oxygen complains with the following message: "The reference to entity "uri-test" declared in the external subset of the DTD or in a parameter entity is not permitted in a standalone document" However, in this case Calabash has no problem and builds the document without any complaints. Here's what the error looks like in Oxygen:
Image

Re: Unresolved Entity not detected or not reported

Posted: Fri Sep 22, 2017 2:28 pm
by Radu
Hi David,

I re-downloaded your entire GitHub Project, I see you have set standalone="yes" on the sample XML, I validate it and validation is successful on my side.
Did you forget to commit to the project some changes that you made to the DTDs?

Regards,
Radu

Re: Unresolved Entity not detected or not reported

Posted: Fri Sep 22, 2017 11:05 pm
by dcramer
It appears this was a bug that disappeared between 18.1 and 19.0. I have several versions of Oxygen running to represent what our users have installed. A users reported the issue and I reproduced it one 18.0, but hadn't checked 19.0.

Thanks,
David

Re: Unresolved Entity not detected or not reported

Posted: Mon Sep 25, 2017 9:31 am
by Radu
Hi David,

Right now I do not remember having such a problem with 18.x but maybe we did.
If there is somebody stuck to using Oxygen 18, I would at least recommend using the latest 18.1 kit available on our web site:

https://www.oxygenxml.com/xml_editor/so ... ditor.html

because if we have such problems being reported for a minor version we create minor bug fix release kits to address them.

Regards,
Radu