Berkley DBXML is messing up W3C standards (Doc.asString())

Post by **dreuzel** » Fri Jan 20, 2006 1:12 pm

I posted this message some time ago, And got some reactions this was impossible.
It is NOT. the character " <" is handled differently from ">"
As These characters are XML structure the whole XML is messed UP !
in the way that one is translated the other is not.

THis problem occurs as one uses document.asString
instead of document.tocontent.

Clearly different code is used for the same implementation
clearly asString does bad, and inconsistent translations

I Think people want to know.

I'm running DBXML of berkley.
I need to us < > and other XML foreign characters in my XML structure
W3c standards tell me to ue < an > as encoding.
This is fine for me ...
As I load the xml record in the DB
as : <test var="<>" />
the xml record is returned as <test vat="<>" />
it seems to interprete the &gt opposed to the &lt
as a result the XML string is corrupted.

Post by **george** » Sat Jan 21, 2006 12:42 pm

Hi,

As I explaind in the previous post, from XML point of view the > may be represented either way, that is either as > or >. The relevant part of the XML specification is here:

http://www.w3.org/TR/2004/REC-xml-20040204/#syntax

The ampersand character (&) and the left angle bracket (<) MUST NOT appear in their literal form, except when used as markup delimiters, or within a comment, a processing instruction, or a CDATA section. If they are needed elsewhere, they MUST be escaped using either numeric character references or the strings "&" and "<" respectively. The right angle bracket (>) MAY be represented using the string ">", and MUST, for compatibility, be escaped using either ">" or a character reference when it appears in the string "]]>" in content, when that string is not marking the end of a CDATA section.

So as you see < MUST be escaped while > MAY be escaped.

Best Regards,
George

Post by **dreuzel** » Mon Jan 23, 2006 10:39 am

Sorry, I do not agree on your remark.

XML and W3C standards have some purpose
the Xml record as you mentioned is correct : <test var="<>" />
though the record <test var="&lt,>" /> is structuraly incorrect.
You as a human can see there is no difference but the parser does not think so.

For this reason i guess the character ">" is banned from the content of the tag VAR.
A parser sees that at least as an indication of the end of XML line.

As <test var="<>" /> is entered in the database using the latest Berkley DB
displaying the document with an DOC.asString() statement, the reocrd is returned as
<test var="&lt,>" /> making the information completely corrupted and so unusable.

There is a bypass using DOC.getContent showing the record as <test var="<>" />
as it should be.

At least to be reliable as DB structure for professional data < and > should be treated the same way. It is NOT the Case

Best regards, but had to counterdict your reply

Post by **dreuzel** » Mon Jan 23, 2006 10:49 am

george,
I think i misread your answer, I think we both agree on what should be.
the problem is is not that W3C nor the standards are wrong, they are NOT.

Berkley DB XML badly codes this W3C standard. and alters the defintion
&gt to > while in its internal interpretation.
Theoreticaly a parse could find its way out.
<TEST var=">" /> but it does NOT the record is flaged as an error as it should be
a correct record
<test var="%lt" /> is mishandled and created into a bad syntax, <TEST var=">" /> wich it detects itself as being bad. This is in all sensitive purposes a corruption

A BUG to be corrected !!!!

Berkley DBXML is messing up W3C standards (Doc.asString())

Berkley DBXML is messing up W3C standards (Doc.asString())

Messing up of W3C characters in Berkley DBXML