[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

[xsl] Using XSLT to build an index

Subject: [xsl] Using XSLT to build an index
From: "Mark" <mark@xxxxxxxxxxxx>
Date: Sun, 30 Oct 2011 14:47:34 -0700

The list archives did not seem to contain an XSLT stylesheet that could index an XML file, but I may have missed it. Is it practical to write my own XSLT 2 indexing stylesheet? If so, I have a bilingual XML file that I want to index. My assumptions are that I must get rid of the punctuation properly, then isolate the words, sort them, remove stop words, and so on. To get started, I need a bit of help. All of the phrases are found in two attributes: @czech and @eng.

Three questions: (1) I am aware from Michaelbs book that regex expressions may be used in the replace() function, but I do not know how to write that regex expression. I would like to remove all the punctuation from a phrase as follows: for everything except a hyphen [-], replacement should be with an empty string; the hyphen should be replaced with a single space.

(2) I assume that to get rid of extra spaces (if any), I can use a construct like: normalize-space(replace(@czech, bsome regex expressionb)).

(3) I assume that tokenize(normalize-space(replace(@czech, 'some regex expression'))) will permit me to write out a list of the words found in those attributes to an XML document. I am not completely clear as to what tokenize() returns, or how to access that return.

I would appreciate any comments, and especially the construction of the regex expression needed. Thanks, Mark

Current Thread
[xsl] Using XSLT to build an index Mark - 30 Oct 2011 21:47:50 -0000 <= G. Ken Holman - 30 Oct 2011 22:07:51 -0000 Michael Kay - 30 Oct 2011 23:07:47 -0000 Mark - 30 Oct 2011 23:24:47 -0000 Mark - 30 Oct 2011 23:11:34 -0000

<- Previous	Index	Next ->
Re: [xsl] Position() Function Using, Michael Kay	Thread	Re: [xsl] Using XSLT to build an in, G. Ken Holman
Re: [xsl] Position() Function Using, Michael Kay	Date	Re: [xsl] Using XSLT to build an in, G. Ken Holman
	Month

Keywords

xml
xslt

[xsl] Using XSLT to build an index

Products

Features

Shop

Resources

Support

Company