Page 1 of 1



Posted: Wed Jul 07, 2010 11:10 am
by Lars Skjærlund
Hi,

I've created a .Net program that extracts some data from a database and creates an XML file of the content.

Unfortunately, some of the textfields has linebreaks which MS .Net encodes as . oXygen 10.3 complains that this is an invalid XML character - but it is not illegal according to section 4.1 of the W3C specification?

Would this be a bug in the validator?

Regards,
Lars

Re: 

Posted: Wed Jul 07, 2010 11:30 am
by george
The section 4.1 from the XML 1.0 spec, Character and Entity References
http://www.w3.org/TR/REC-xml/#sec-references
refers also to the well-formedness constraint Legal Character that says

"Characters referred to using character references MUST match the production for Char."

pointing to
http://www.w3.org/TR/REC-xml/#NT-Char

[2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]

and, as you can see #xC is not part of the Char production.

On the other hand, #xC is allowed in XML 1.1, see

http://www.w3.org/TR/xml11/#sec-references
http://www.w3.org/TR/xml11/#NT-Char
[2] Char ::= [#x1-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]

So, to conclude, as your document is XML 1.0 it does not allow the #xC character but if you specify <?xml version="1.1"?> in the XML header of your file then the #xC character is allowed.

Best Regards,
George

Re: &#xC;

Posted: Wed Jul 07, 2010 11:37 am
by Lars Skjærlund
Hi George,

Well - what can I say? Amazing - I've never seen this level of support before... :D

Thanks,
Lars