Implementation of schema-aware editing

Post by **kilhor** » Mon Dec 20, 2004 5:14 am

I have a general question.

We need for our research project on feature based modeling http://control.ee.ethz.ch/~ceg/fbfm the functionality similar to schema aware editing.

We are building graphical editor for feature models. Feature models are tree-like structures that can be represented as XML documents. We have XML Schemas that validate the XML documents (feature models).

What we need is to get (based on XSD) a list of legal subelements of the current node when the user right-clicks on a tree node. This is exactly the functionality offered by Oxygen (we use Oxygen for manual editing of the feature models).

What we do not know is how to find out the list of legal subelements when working with general XML Schema. Can anyone hint us how this functionality is implemented?

Many thanks in advance,

Ondrej

Post by **george** » Mon Dec 20, 2004 12:41 pm

Hi Ondrej,

You need first an XML Schema framework that will give you access to the schema components. In oXygen we are using Xerces.
Then you need to determine the context information (for XML Schema you need to know information about all the elements up to the root element as there can be local elements with different content than the global elements). With the schema model determined at first step and with the context information you need to determine the model of the element you are interested in. Further there are two possible approaces to determine the possible subelements. One is to walk the model and determine the elements (you should take into account in this case also substitution groups) and the other is to try to validate the element content trying to insert each possible element as a subelement.

Best regards,
George

Post by **kilhor** » Mon Dec 20, 2004 2:34 pm

This all looks pretty complicated. In fact, it requires to implement everything from sratch starting only with Xerces to manipulate XML files.
Are you aware of some implementation that is offered under some open license (at least for the research purposes)?

Many thanks in advance,
Ondrej

Post by **george** » Mon Dec 20, 2004 3:26 pm

Hi Ondrej,

Well, it is now very complicated in fact

and Xerces is not used only for manipulating XML files, it offers an XML Schema API. If your document is wellformed then you may obtain the element model when you parse it with Xerces and then just walk the model to get the elements that can be placed inside (if your schema does not define substitution groups then you can just do not care about them). See the Xerces samples and look in the xni.PSVIWritter class.

Best regards,
George

Post by **kilhor** » Mon Dec 20, 2004 10:01 pm

Thanks for all the hints and support I will try Xerces.

Ondrej

Post by **kilhor** » Wed Dec 22, 2004 6:07 pm

Hello!

So, we definitely will do it.
Can you give us an estimate how long this may take us (one night, two weeks)?
Or alternatively how many lines of code it will be (aprox. hundred(2) or thousands(s))?

Thanks as usual

Ondrej

Post by **george** » Wed Dec 22, 2004 6:18 pm

Hi Ondrej,

If you identify the element type as you parse the XML file then walking the model and extracting the elements should be less than 200 lines of code. You just walk on particles and store the element declarations.

Best Regards,
George

Post by **kilhor** » Fri Jan 21, 2005 5:05 pm

Hello again!

We started with the implementation and get stuck with the algorithm to walk the model. Let me recall your advice from Dec 20.

george wrote:One is to walk the model and determine the elements (you should take into account in this case also substitution groups) and the other is to try to validate the element content trying to insert each possible element as a subelement.

I will demonstrate the troubles on an example. Consider the element 'TheFeature' that can contain subelements F1, F2, F3, F4 while only only 2 or 3 of these subelements can be contained in the 'TheFeature'. The legal content thus can be e.g. <TheFeature><F1/><F2/></TheFeature> of <TheFeature><F1/><F3/><F4/></TheFeature> but not <TheFeature><F1/></TheFeature>, <TheFeature><F1/><F2/><F3/><F4/></TheFeature> or <TheFeature><F1/><F1/><F1/></TheFeature>.

The XML Schema that enforces this constraints is at http://control.ee.ethz.ch/~rohliko/xml/2-3of4.xsd and below is the graphical representation of the 'TheFeature'):

The group 'problemativgroup' is fairly complex but it is the only way we can express the constraint described above. Note that this schema is automaticaly genereted from the feature model using XSL program. This is why there are some relics like those in light yellow color.

Now, the problem: Say, the user is in the middle of editing. He already has <TheFeature><F2/><F3/></TheFeature>. When right-clicking the 'TheFeature' box in our tool we want to offer to the user all legal subfeatures -- in this case the F1 marked in blue and F4 marked in green.

We tried the both advices we have got. The second one 'try to validate the element content trying to insert each possible element as a subelement' would work for this particular case but not generaly. If, for example, it is necessary to add at least to subelements (say F1 and F0) to have the TheFeature valid (this assumes aditional subelement F0 defined in the schema) then after trying to add possible subelement F1 the 'TheFeature' is still not valid (required F0 is still missing).

The first advice i.e. walk the model and determine the elements is the one we expect to work fine, especially since we do not have any subtitution groups. However we failed to find out the algorithm to do so. More precisely out of the four ideas we have only the 'brute force' algoritm seems to work.

The brute force algorithm should work like this:
1) Generate ALL possible combinations of subelements bu recursively traversing the particles in schema model (in our simple XSD it is (F1, F2), (F1, F3), (F1, F4), (F2, F3), (F2, F4), (F3, F4), (F1, F2, F3), (F1, F2, F4), (F1, F3, F4), (F2, F3, F4)).
2) Look for those combination that contain all subelements already in the edited XML document (F2 and F3) and note subelements that can extend the current content of the element (of element 'TheFeature').
3) Make union of those noted subelements. This gives F1 and F4.

Clearly, the fact the we have to generate all the combinations is ill. Just consider the complexity of the same example extended to select three, four or five elements out of seven instead of out 2 - 3 elements out of 4 -- (to have an idea of how this can look like you mau have a look at http://control.ee.ethz.ch/~rohliko/xml/3-5of7.png and http://control.ee.ethz.ch/~rohliko/xml/3-5of7.xsd). The have to be more elegant solution. First prove of this: the code completion in Oxygen is very fast. Second prove: The authors of XML Schema are people that had in mind the issue of schema validation and (surely) designed the XML Schema in such a way that even the structures like the one we have can be analyzed easily.

Would you be so kind and point us to right direction to go?

Thank you very much,
Ondrej

Post by **kilhor** » Sat Jan 22, 2005 12:28 am

When I demonstrated to others how Oxygen uses the XML Schema for finding out what elements to offer to the user I found out that it has problems under certain circumstances. If there are already some elements in the edited XML and the user want to add an element before these elements he/she may be offered wrong set of possible elements. I tried to demonstrate it on the schema introduced in my previous post. See the following four screenshots.

When the 'TheFeature' element is empty Oxygen offers the right set of features (it correctly doesn't offer F4):

When there are already F2 and F3 and the user wants to add an element before F2, Oxygen should only offer F1 (not F2 and F3):

When there are already F1 and F2, Oxygen correctly offer F3 and F4 (where only on of them can be added, not both):

When the user enters F3, and wants to add new element, he/she is not allowed to add F4 (which is also correct behavior):

Finaly, similar behavior as on the second screenshot can be seen below. F3 should not be offers since there already three subelements in the 'TheFeature'. Also F4 should not be offered since it is already presnet.:

If you think that I should report it as a bug, I will. In the meantime I will keep this here since it is related to the topics discussed here.

All the best,
Ondrej

Post by **george** » Sat Jan 22, 2005 1:47 pm

Hi Ondrej,

That is how oXygen was desiged to work. It takes into account the document entered up to the insertion point. Thus we are able to determine clearly the context in which we need the next elements and the next elements are the ones that are accepted by the parent element particle after we feed the preceding siblings.

So the context is:

parent element ---> particle
preceding siblings

and we need the next elements.

In your case you set the context
parent element ---> particle
element1
[gap]
element2
[gap]
and so on.

and you want to get a union with all the elements that added in the gap placeholders (a gap placeholder can be replaced with zero or more elements) will create valid content for the parent element. I'm affraid that this is a really hard to solve problem in the general case.

Best Regards,
George