Check document Form: element names starting with 'xml'

Are you missing a feature? Request its implementation here.
mariuss
Posts: 21
Joined: Tue Sep 09, 2003 10:37 pm
Location: Vancouver, Canada

Check document Form: element names starting with 'xml'

Post by mariuss »

When checking if a document is well formed there is one thing that is missed, it is the fact that element names cannot start with 'xml' (in any case):
http://www.w3.org/TR/REC-xml#sec-logical-struct

This should take into consideration the version of XML since future versions may add some standard tags (very unlikely though) with names starting with 'xml'.

The following document is reported as being well formed and it is not:

Code: Select all


<?xml version="1.0" encoding="UTF-8"?>
<root>
<xmlTest/>
</root>
Marius
george
Site Admin
Posts: 2095
Joined: Thu Jan 09, 2003 2:58 pm

Post by george »

Hi Marius,

I rised this on xerces-dev mailing list, see the messages below:

http://nagoya.apache.org/eyebrowse/Read ... msgNo=3317
http://nagoya.apache.org/eyebrowse/Read ... msgNo=3320

I guess an warning would be nice but there are a few problems:
- further specifications may add new attributes making an older version of the parser report an warning although the element or attribute is specified at that time.
- maintenance issues - the parser should keep a list with all names defined by the current specifications.
- performance issues - as the parser should check each element or attribute name to see if it starts with this combination and it is not in the specified list of allowed names.

Best Regards,
George
mariuss
Posts: 21
Joined: Tue Sep 09, 2003 10:37 pm
Location: Vancouver, Canada

Post by mariuss »

Hi George,

Thanks for following up on this issue. I can see now that it is not trivial to implement this validation. I was trying to read the discussions you pointed at but the server seems to be down, I will keep trying.

A few things come to mind:
- since every XML document has a signature at the begining stating a version number it is possible to keep a list of known attributes starting with 'xml' for each version
- if the parser does not have this list for some version then it can skip this validation (and issue a warning)
- the performance hit should not be too big, especially considering that a necessary validation is done
- the XML standard does not change too often, so far there is only version 1

Cheers,
Marius
Post Reply