[oXygen-user] Re: [dita-users] All possible Xpath generator?

Lars Huttar
Fri Feb 15 07:20:46 CST 2008


I was thinking the same thing as Wendell wrote, regarding recursive 
structures.

However, if the request is limited to non-recursive structures (and to
DTDs, not XML Schema... personal preference) it sounds like a fun little
project.

.mod inclusion should come for free with any library that processes 
DTDs, because as I understand it, we're talking about parameter 
entities, which are a required part of an XML parser (at least, of an 
XML parser that handles DTDs at all).

I was wondering what is an easy-to-use DTD parser that exposes the rules 
(not just a validation function) via an API. If there is one for Python, 
for example, it seems like writing this tool would be easy.

So far the most promising I've found is xml.parsers.expat, which offers 
the following handler for element type declarations:

	ElementDeclHandler(name, model)

	Called once for each element type declaration. name is the name of the 
element type, and model is a representation of the content model.

This should allow one to build up a data structure of element type 
declarations, and use that to generate a list of possible XPaths (within 
constraints).

As for recursive structures... you could detect them pretty easily, I 
think. And when they occur, you could indicate them something like this:
	/doc/body/div/div...
	/doc/body/div/list/item/list...
where the ... indicates that the last element name begins a recursion.
Since DTDs basically describe context-free grammars, we don't need to 
worry about recursion beyond one level.

Another alternative would be to limit the number of levels of recursion, 
or the total depth of any XPath generated.
E.g. you could tell the generator to generate all XPaths with up to 2 
levels of recursion (of all element types), or up to 10 path steps ('/').
This sort of allowed-but-constrained recursion could be useful in some 
cases, where you have e.g. divs within divs that have different meaning 
from divs not within divs.

I'm assuming from your description that you are not concerned about 
attributes or about text content of elements.
Also, you said "Xpaths from the root element to each possible leaf 
element", which means we have to remove XPaths for elements that cannot 
be leaves (which in your case means they must have child *elements*, not 
just text or attributes or anything else). That makes it a bit harder.

I've hacked up a python prototype and will email it to you, once I have 
it checking for recursion.

Lars


On 2/14/2008 11:13 AM, Wendell Piez wrote:
> Hedley,
>
> Unfortunately the list of "all possible XPaths to a text file" is 
> infinite in many cases, due to the possibility of recursive structures 
> such as nested div or section elements, lists inside lists, or inline 
> elements that may have arbitrary inline elements in their content.
>
> Do you really want a path such as 
> "/doc/body/div/div/div/div/list/item/list/item/list/item/p/figure/caption/p/b/i/mono/i/b" 
> even such a path points to an element that could be valid?
>
> I think Dan is right that the requirement needs some refinement.
>
> Cheers,
> Wendell
>
> At 08:42 PM 2/12/2008, you wrote:
>> Dan:
>>
>>> At Wednesday, 13/02/2008, 12:16 PM;, Dan wrote:
>>> your post is a bit confusing, and some better details/explanations
>>> would be nice to see.  What do you mean by "write a list of all
>>> possible absolute Xpaths to a text file."
>>
>> Rephrasing my original request:
>> I am developing a CSS implementation for [instance documents that 
>> conform to] an XML schema. It would really help to check if all 
>> [required CSS class matches] have been covered if I could find a 
>> utility that would scan a DTD (including *.mod inclusions) or XML 
>> Schema to write a list of all absolute -- not including wildcards -- 
>> [Xpaths from the root element to each possible leaf element] ... to a 
>> text file. For example, using a possible path from a DITA DTD:
>>
>>         /reference/refbody/section/p
>>
>> This would help determine what class definitions can be generic no  
>> matter in what context an element apppears (e.g. <i>) and what may  
>> need different treatment depending on context (e.g.  
>> */section/title). Oh, and if the generator could list the Xpath in  
>> reverse, from leaf node to root, as well that would be pleasant:
>>
>>         p\section\refbody\reference\
>>
>> Then I could sort the list of paths so that all instances where <p> 
>> was a leaf would be together and I could decide which contexts could 
>> share a CSS class and which would need context-specific classes.
>>
>> I've tried using the <oXygen/> instance generator on the DITA 
>> task.xsd, but even limiting recursion depth and number of 
>> repetitions, it produces very large files, possibly not completing in 
>> my lifetime.  And then there is the problem of extracting the Xpaths.
>>
>> Hope that makes it clearer,
>> Hedley
>>
>>
>> -- 
>> Hedley Stewart Finger
>> 28 Regent Street   Camberwell VIC 3124   Australia
>> Tel. +61 3 9809 1229   Mobile +61 412 461 558,
>> E-mail <mailto:>
>>
>>
>> _______________________________________________
>> oXygen-user mailing list
>> 
>> http://www.oxygenxml.com/mailman/listinfo/oxygen-user
>






More information about the oXygen-user mailing list