[XSL-LIST Mailing List Archive Home]
[By Thread]
[By Date]
I don't think it's straightforward at all - people have spent years perfecting algorithms for finding diffs between two sequences. I'm no expert on this area, but if I had the problem I would start by searching for appropriate algorithms before even thinking about writing an XSLT implementation. Presumably there's a trade-off between the time spent and the perfection of the result.
On 30/09/2010 5:51 PM, Markus Flatscher wrote:
Re: [xsl] Aligning/merging two sequences
Subject: Re: [xsl] Aligning/merging two sequences From: Michael Kay <mike@xxxxxxxxxxxx> Date: Thu, 30 Sep 2010 18:08:32 +0100 |
I don't think it's straightforward at all - people have spent years perfecting algorithms for finding diffs between two sequences. I'm no expert on this area, but if I had the problem I would start by searching for appropriate algorithms before even thinking about writing an XSLT implementation. Presumably there's a trade-off between the time spent and the perfection of the result.
Michael Kay Saxonica
On 30/09/2010 5:51 PM, Markus Flatscher wrote:
I'm banging my head against a sequence alignment problem. I have a feeling that this is straightforward, but I can't put my finger on what's missing from my attempts.
Suppose I have two inputs like so, where input1//w is always a subset of input2//w:
<input1> <w n="1">I</w> <w n="2">am</w> <w n="3">a</w> <w n="4">sequence</w> </input1>
<input2> <w>I</w> <w>am</w> <w>a</w> <w>longer</w> <w>longer</w> <w>sequence</w> </input2>
I'd like to get output like so:
<output> <w n="1">I</w> <w n="2">am</w> <w n="3">a</w> <w n="skipped">longer</w> <w n="skipped">longer</w> <w n="4">sequence</w> </output>
I.e., for each input1//w, @n should be copied to the nearest following sibling <w> in input2 that matches .; <w>s in input2 that aren't in input1 should be flagged as "skipped".
P.S.: The use case is aligning an imperfect but timestamped transcription of an audio file (input1, machine-generated) with a perfect but not-timestamped one (input2, human-generated).
Thanks much for any help,
Markus
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
[xsl] Aligning/merging two sequence, Markus Flatscher | Thread | Re: [xsl] Aligning/merging two sequ, Martin Honnen |
[xsl] Hyphenation code, Dave Pawson | Date | Re: [xsl] Aligning/merging two sequ, Martin Honnen |
Month |
Keywords