[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] Diffing XML


Subject: Re: [xsl] Diffing XML
From: Dimitre Novatchev <dnovatchev@xxxxxxxxx>
Date: Wed, 24 Oct 2012 07:37:27 -0700

In case this could be helpful:

I have been using an adaptation of this technique and found it useful
and adequate to my needs:

  http://stackoverflow.com/a/4747858/36305

as a quite robust verifier that two XML documents are equal.

It lists all differences found, but doesn't produce a visual diff
representation.

Cheers,

Dimitre


On Wed, Oct 24, 2012 at 7:27 AM, Emma Burrows <Emma.Burrows@xxxxxxxxxxx>
wrote:
> Thinking about it further - I'm wondering whether something like deep-equals
might suffice. What the users apparently really want right now is to know
which parts of the document have changed so they can concentrate on those when
checking the output on a website. In which case, starting with top-level
elements and iterating my way down through the children, I could in theory at
the very least output "Something has changed in <p> number 3 in the topic
entitled 'Topic Title'".
>
> I realise there are many pitfalls ahead and of course the minute they see
it, they will say "oh, but can't you make it do X?", but if I can convince
them they don't need to know exactly what has changed (I'm an optimist), that
might help. Or is there an even better way? (Assuming one were daft enough to
take on such a project :)
>
> Emma
>
>
> -----Original Message-----
> From: Emma Burrows [mailto:Emma.Burrows@xxxxxxxxxxx]
> Sent: 24 October 2012 14:48
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject: RE: [xsl] Diffing XML
>
> Thanks Michael,
>
> Thanks for the response. Yes, I'm thinking doing it entirely myself might a
bit too ambitious. The data is relatively stable at this point and gets
updated once a month which should theoretically reduce the number of things to
check for each time. But even so, I can tell diffing is an art.
>
> DeltaXML does seem to offer some interesting options and it could probably
be integrated into our CMS (given a chisel and a mallet - the CMS is getting a
bit old), but I don't think we have any budget to buy another tool and it
sounds as if the users have some very specific requirements (like exporting
the list of user-friendly differences to an Excel spreadsheet!). So I was
looking for ideas about how to tackle the problem just in case I do indeed
need to implement it!
>
> Emma
>
>
> -----Original Message-----
> From: Michael Kay [mailto:mike@xxxxxxxxxxxx]
> Sent: 24 October 2012 11:48
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject: Re: [xsl] Diffing XML
>
> In general differencing well is quite a challenge, e.g. handling an
arbitrary number of inserted elements in either document,  addition or removal
of "div" layers, combining/splitting of paragraphs, reformatted indentation,
etc. Doing it better than a general-purpose product such as DeltaXML could
turn out to be a project that will keep you busy for a while.
>
> Michael Kay
> Saxonica
>
> On 24/10/2012 11:36, Emma Burrows wrote:
>> I have a requirement to produce an end-user-readable "checklist" of all the
places where an XML file has changed since the last version, with custom
explanations of what each difference is. I'm able to run diffs which are fine
for my own purposes, but the end users need the differences spelled out more
precisely in plain language (eg: "there is an extra paragraph here", "the text
'xyz' has changed", "the attribute 'audience' has been changed to 'book'"
etc).
>>
>> Being an XSLT developer, I'm thinking of using an XSLT stylesheet to work
on the "new" version of the file, document() in the "old" version, and then
compare the nodes in the "new" version to those in the "old" version,
generating appropriate messages into an HTML output as I go along.
>>
>> Does that sound like a reasonable approach? Are there existing tools
>> or examples that might do what I'm after? Any recommendations on the
>> best way of comparing individual nodes? I am planning to do this in
>> Oxygen 14 so the world is my oyster as far as XSLT is concerned. :)
>>
>> Just looking for general suggestions to point me in the right direction.
Thanks!
>>
>>
>> ______________________________________________________________________
>> This email has been scanned by the Symantec Email Security.cloud service.
>> For more information please visit http://www.symanteccloud.com
>> ______________________________________________________________________
>
>
> ______________________________________________________________________
> This email has been scanned by the Symantec Email Security.cloud service.
> For more information please visit http://www.symanteccloud.com
______________________________________________________________________
>
> ______________________________________________________________________
> This email has been scanned by the Symantec Email Security.cloud service.
> For more information please visit http://www.symanteccloud.com
______________________________________________________________________
>
>
> ______________________________________________________________________
> This email has been scanned by the Symantec Email Security.cloud service.
> For more information please visit http://www.symanteccloud.com
______________________________________________________________________
>
> ______________________________________________________________________
> This email has been scanned by the Symantec Email Security.cloud service.
> For more information please visit http://www.symanteccloud.com
> ______________________________________________________________________
>



--
Cheers,
Dimitre Novatchev
---------------------------------------
Truly great madness cannot be achieved without significant intelligence.
---------------------------------------
To invent, you need a good imagination and a pile of junk
-------------------------------------
Never fight an inanimate object
-------------------------------------
To avoid situations in which you might make mistakes may be the
biggest mistake of all
------------------------------------
Quality means doing it right when no one is looking.
-------------------------------------
You've achieved success in your field when you don't know whether what
you're doing is work or play
-------------------------------------
Facts do not cease to exist because they are ignored.
-------------------------------------
Typing monkeys will write all Shakespeare's works in 200yrs.Will they
write all patents, too? :)
-------------------------------------
I finally figured out the only reason to be alive is to enjoy it.


Current Thread
Keywords