'ghost' attributes showing up in XML source

Having trouble installing Oxygen? Got a bug to report? Post it all here.
TPulhamus
Posts: 7
Joined: Fri May 14, 2010 1:34 am

'ghost' attributes showing up in XML source

Post by TPulhamus »

I've run into a problem using 11.2. I'm using it to work with EAD encoded documents though I'm not making use of the framework. I was running a transform to strip out empty text nodes when I noticed that there were attributes present in the results that weren't present in the source XML. The source document validates under all the engines provided (I've tried this out on both a Windows and Mac OS X install, both current) and validates externally. I even examined the source documents in a hex editor and couldn't find anything amiss. Yet when I look at the source in Oxygen, the attributes not present in the source show up in the attributes pane (though they are grayed out). I can find them via XPath, though they don't show any location. They even show up in the tree view! Strangely enough, if I delete the values in either the tree view or the attribute pane, the empty attributes appear in the source and the document fails to validate.

The same problem with the same attributes (era and calendar on date and unitdate, both set to their default values, linktype on title (very strange in itself) elements) happens with any EAD document I load in Oxygen. More strangely, I had the same issue crop up when running the transform externally via Kernow (which has Saxon 9.1.0.3 I think). I set the explain flag to check out the transform and it also found the same attributes in the source as it parsed it out. I replaced the EAD.dtd I was using, downloading new copies from the LOC. But that didn't make a difference. I would think it might be some Java issue but for the fact that the windows install is running the VM included in the download while on the mac it's running on the latest and greatest, so to speak. The version of Oxygen on windows was an upgrade from 10.3, which didn't have this issue as far as I noticed. And that it happened there only now brought it to light here on the Mac. I'm really at a loss to understand what's going on here and how this information can be showing up in the source though it isn't present in its byte code.

Any help would be appreciated. Thanks in advance. And thanks for Oxygen. I've been using it since about 2006 and really appreciate the its diverse and deep capabilities.
adrian
Posts: 2855
Joined: Tue May 17, 2005 4:01 pm

Re: 'ghost' attributes showing up in XML source

Post by adrian »

Hello,

The greyed out attributes that you are seeing in Oxygen are attributes with default values defined in the DTD or schema. This means that even though the attributes aren't explicitly mentioned in the XML file they are inherited from the DTD/schema.

When you delete the value you are actually explicitly setting an empty value for that attribute that overrides the default value(e.g. attribute=""). Depending on the DTD/schema empty values may not be allowed for some attributes so that's why the validation fails.

Don't worry about these attributes they are usually there to specify a default configuration.

Let me know if you need further assistance.

Regards,
Adrian
Adrian Buza
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com
TPulhamus
Posts: 7
Joined: Fri May 14, 2010 1:34 am

Re: 'ghost' attributes showing up in XML source

Post by TPulhamus »

Hi. Thanks for the fast reply. I think perhaps though I did not make myself clear. I understand that these are the default value for these attributes. But according to the DTD they are optional attributes.

More importantly, they do not appear in the source XML. Yet, when the source is examined in the tree view or an XPath is run searching for the particular attributes (which do appear in other points in the source document) or a transformation is run, the nodes are shown/returned/copied to the result as if they actually do appear in the source XML.

The documents are valid as it is the default value of these attributes that is showing up. But we did not put them there. They are being added in a seemingly arbitrary manner. Long story short, tree view/ XPath/ transformation are asserting that optional (as defined by the DTD) nodes with their default value are present in the source XML when, in fact, they are not. And not all optional nodes, only some.

I've been writing transforms on EAD for the last five years and always with Oxygen as my tool of choice. Believe me when I say I understand what a default attribute looks like in the interface. I am worried about them as they are showing up in my result documents when they were never present in the source. Do you see my point?
adrian
Posts: 2855
Joined: Tue May 17, 2005 4:01 pm

Re: 'ghost' attributes showing up in XML source

Post by adrian »

TPulhamus wrote:More importantly, they do not appear in the source XML. Yet, when the source is examined in the tree view or an XPath is run searching for the particular attributes (which do appear in other points in the source document) or a transformation is run, the nodes are shown/returned/copied to the result as if they actually do appear in the source XML.
Yes, that's correct they are shown and in the tree they are marked distinctively but for XPath you can't really tell the difference.
But what I'm saying is that this is normal because XML-wise even though they aren't mentioned in the XML source they are implied from the DTD/schema. So in conclusion the schema always contributes to the XML source.
TPulhamus wrote:The documents are valid as it is the default value of these attributes that is showing up. But we did not put them there. They are being added in a seemingly arbitrary manner. Long story short, tree view/ XPath/ transformation are asserting that optional (as defined by the DTD) nodes with their default value are present in the source XML when, in fact, they are not. And not all optional nodes, only some.
It's not arbitrary, all optional attributes with default values will appear this way. If an attribute is optional but doesn't have a default value then it doesn't contribute. Again this is the XML model not just the XML source.
I understand that you believe that these should reflect the exact XML source, but the XML model depends on both the XML source and on the DTD or schema. The tree editor, XPath evaluation and also the transformations all work on this XML model, not just the XML source. They have always worked this way.
TPulhamus wrote:I've been writing transforms on EAD for the last five years and always with Oxygen as my tool of choice. Believe me when I say I understand what a default attribute looks like in the interface. I am worried about them as they are showing up in my result documents when they were never present in the source. Do you see my point?
I see, but this hasn't changed in Oxygen, it's been doing this for more than five years and this is considered the norm. The only way to avoid this is to disconnect the XML source from the DTD/schema.

The only change(starting with 10.x) that may affect you is that Oxygen provides an EAD Document Type Association(a framework) which may inadvertently be detected and used for your EAD documents. So the only idea that comes to mind if this bothers you is to disable the EAD framework in Oxygen (Options -> Preferences -> Document Type Association, look for EAD in the list and clear the checkbox from the first column), unless I misunderstood and you have already disabled it.

Regards,
Adrian
Adrian Buza
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com
TPulhamus
Posts: 7
Joined: Fri May 14, 2010 1:34 am

Re: 'ghost' attributes showing up in XML source

Post by TPulhamus »

So in conclusion the schema always contributes to the XML source.
Wait a second. Are you trying to tell me that when I run the XPath "//date/@era" on a document that it will return results for optional attributes with default values defined in the DTD regardless of whether or not they are in the source document the XPath is run against? That the DTD will actually add attributes to the result of a transformation regardless of whether or not they appear in the source XML? Forgive me if I find that a bit, uh, ridiculous. The DTD's contribution to the XML source aside, data should not be added to the source. Allow me to illustrate.

Here is a screen shot with the source document. I've underlined the element in question in red. It is the first occurrence of the date element in the document. As you can see, it has no @era or @calendar though they are, of course, shown in the attribute panel and the model with the default value indicated. http://picasaweb.google.com/lh/photo/tq ... directlink

Here is a screen shot of the same document after I've run an XPath (visible in the top right corner; if you can't see it it is "//unitdate/@era | //date/@era"). The same element I identified in the picture above is shown underlined. Note once again that the source does not contain an @era. And underneath are the results of the XPath. The element in question is shown right at the top as having @era though no location is revealed. In fact, there are a whole lot of @era with no location. Should an XPath return results for attributes that don't exist except in the DTD and even there as optional? http://picasaweb.google.com/lh/photo/lr ... directlink

Here is a screen shot of the other element on which an unwanted attribute is appearing, in this case the @linktype on a title element. Once again, the first occurrence of the element in the source document is highlighted. The XPath "//title/@linktype" was run against it. And once again a whole lot of occurrences without location are returned. http://picasaweb.google.com/lh/photo/NM ... directlink

Here is the same document after I've run the following transform against it

Code: Select all

    <xsl:output method="xml" encoding="UTF-8" indent="no" doctype-public="+//ISBN 1-931666-00-8//DTD ead.dtd (Encoded Archival Description (EAD) Version 2002)//EN" doctype-system="ead.dtd"/>

<xsl:template match="/">
<xsl:apply-templates/>
</xsl:template>

<xsl:template match="node() | @*">
<xsl:copy>
<xsl:apply-templates select="
node()[not(self::text())] | text()[normalize-space() != ''] | @*
" />
</xsl:copy>
</xsl:template>
For brevity's sake, I've cut it down to show only the same elements as above. As you can clearly see, attribute nodes have been added to the result despite the fact that they did not appear in the source. This is an unwanted result.http://picasaweb.google.com/lh/photo/ef ... directlink

It's not arbitrary, all optional attributes with default values will appear this way. ... this hasn't changed in Oxygen, it's been doing this for more than five years and this is considered the norm. The only way to avoid this is to disconnect the XML source from the DTD/schema.
Uh, no and yeah. No, attributes, with default values, defined as optional in the DTD should not be appearing in XPath and XSLT results as if they were there in the source. And yeah, it has changed. Oxygen has never given me these results before, in 9.x or 10.x, which is to say it hasn't added the data to documents as a consequence of the DTD, especially when that data is optional. If the
only way to avoid this is to disconnect the XML source from the DTD/schema
then I'd say there's a serious problem since XML disconnected from its DTD/schema is not verifiable as conformant, only well-formed. And then why bother with XML at all? To be perfectly honest, I'm surprised you would suggest such a thing.
TPulhamus
Posts: 7
Joined: Fri May 14, 2010 1:34 am

Re: 'ghost' attributes showing up in XML source

Post by TPulhamus »

I forgot to mention that I did in fact disable the framework. Didn't change anything.
adrian
Posts: 2855
Joined: Tue May 17, 2005 4:01 pm

Re: 'ghost' attributes showing up in XML source

Post by adrian »

The DTD's contribution to the XML source aside, data should not be added to the source.
And it never is, the source remains untouched. The transformation result on the other hand is affected because the input is not composed of the XML source alone, but by the XML parsed model which also includes the DTD subset. If elements are copied from the XML source by the XSLT stylesheet then they will contain the default values of the optional attributes, so they will differ from the XML source.

Allow me to show you an example I can relate to and that that you can try with any version of Oxygen:
samples/personal.xml from [Oxygen-installation-folder]/samples.

I went as far as Oxygen 7.x and tried all these mentioned below and they have always worked the same. But you can always try this for yourself.

Open personal.xml and try the XPath expression: //*[@contr='false'] and you will obtain all the 'person' elements from the document because the optional attribute contr has the default value false as specified by the DTD.

Also, transform: samples/personal.xml with the XSLT: samples/xhtml/copy.xsl
copy.xsl is a stylesheet that simply copies the content of the source.

So with this input:

Code: Select all

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE personnel SYSTEM "personal.dtd">
<personnel>
<person id="Big.Boss">
<name>
<family>Boss</family>
<given>Big</given>
</name>
<email>chief@oxygenxml.com</email>
<link subordinates="one.worker two.worker three.worker four.worker five.worker"/>
</person>
...
</personnel>
after the transformation with copy.xsl you obtain this output:

Code: Select all


<?xml version="1.0" encoding="utf-8"?>
<personnel>
<person id="Big.Boss" contr="false">
<name>
<family>Boss</family>
<given>Big</given>
</name>
<email>chief@oxygenxml.com</email>
<link subordinates="one.worker two.worker three.worker four.worker five.worker"/>
</person>
...
</personnel>
And yes, the result contains the optional attributes with default values from the DTD(contr=false for the person element) when it is transformed in all versions of Oxygen starting with 7.2, maybe even further back.

The only notable difference I have found is that in 7.2 the Attributes view was missing the default value of the optional attributes. It seems they were only introduced there starting with 8.x. But everything else, XPath and transformations have been like this at least since 7.2.
TPulhamus wrote:If the
adrian wrote:only way to avoid this is to disconnect the XML source from the DTD/schema
then I'd say there's a serious problem since XML disconnected from its DTD/schema is not verifiable as conformant, only well-formed. And then why bother with XML at all? To be perfectly honest, I'm surprised you would suggest such a thing
You can always validate a document with a separate DTD or create a validation scenario and pick the DTD to validate with. The DTD doesn't have to be associated with the document for the validation alone. The main reason DTDs are associated to a document is the very reason of this argument, so that they can contribute to the XML source with their declarations. Think of a DOCTYPE declaration as an inclusion of the DTD subset in the XML file.

So for personal.xml you could embed the subset and it would look like this:

Code: Select all

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE personnel [
<!--doc:Defines the personnel as a collection of person elements. -->
<!ELEMENT personnel (person)+>
<!--doc:Specify information about a person. -->
<!ELEMENT person (name,email*,url*,link?)>
<!ATTLIST person id ID #REQUIRED>
<!ATTLIST person note CDATA #IMPLIED>
<!ATTLIST person contr (true|false) 'false'>
<!ATTLIST person salary CDATA #IMPLIED>
<!--doc:Specify the person family and given name.-->
<!ELEMENT name ((family,given)|(given,family))>
<!--doc:The person last name.-->
<!ELEMENT family (#PCDATA)>
<!--doc:The person first name.-->
<!ELEMENT given (#PCDATA)>
<!--doc:Email address for this person.-->
<!ELEMENT email (#PCDATA)>
<!--doc:Enter an URL for this person.-->
<!ELEMENT url EMPTY>
<!ATTLIST url href CDATA 'http://'>
<!--doc:Specify who is the manager and who are the subordinates for this person. -->
<!ELEMENT link EMPTY>
<!ATTLIST link manager IDREF #IMPLIED>
<!ATTLIST link subordinates IDREFS #IMPLIED>

<!NOTATION gif PUBLIC '-//APP/Photoshop/4.0' 'photoshop.exe'>
]>
<?xml-stylesheet type="text/css" href="personal.css"?>
<personnel>
<person id="Big.Boss">
<name>
<family>Boss</family>
<given>Big</given>
</name>
<email>chief@oxygenxml.com</email>
<link subordinates="one.worker two.worker three.worker four.worker five.worker"/>
</person>
<person id="one.worker">
<name>
<family>Worker</family>
<given>One</given>
</name>
<email>one@oxygenxml.com</email>
<link manager="Big.Boss"/>
</person>
<person id="two.worker">
<name>
<family>Worker</family>
<given>Two</given>
</name>
<email>two@oxygenxml.com</email>
<link manager="Big.Boss"/>
</person>
<person id="three.worker">
<name>
<family>Worker</family>
<given>Three</given>
</name>
<email>three@oxygenxml.com</email>
<link manager="Big.Boss"/>
</person>
<person id="four.worker">
<name>
<family>Worker</family>
<given>Four</given>
</name>
<email>four@oxygenxml.com</email>
<link manager="Big.Boss"/>
</person>
<person id="five.worker">
<name>
<family>Worker</family>
<given>Five</given>
</name>
<email>five@oxygenxml.com</email>
<link manager="Big.Boss"/>
</person>
</personnel>
This is a valid XML file with an internal subset, you can validate and run XPath or transformations over it and it performs the same as the original personal.xml file.
Would you find the default values of the attributes appropriate for this XML file? The attribute declaration with its default value is literally in the XML source.

Actually this isn't even Oxygen specific, Oxygen itself uses various parsers and transformers and they are the ones that behave like this by default. You can even try them outside of Oxygen's influence and see how they perform, they are publicly available: Saxon 6.5.5, Saxon-HE or the old Saxon-B.

Regards,
Adrian
Adrian Buza
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com
Post Reply