org.exist.storage
Class TextSearchEngine

java.lang.Object
  extended byjava.util.Observable
      extended byorg.exist.storage.TextSearchEngine
Direct Known Subclasses:
NativeTextEngine

public abstract class TextSearchEngine
extends java.util.Observable

This is the base class for all classes providing access to the fulltext index. The class has methods to add text and attribute nodes to the fulltext index, or to search for nodes matching selected search terms.

Author:
wolf

Field Summary
protected  DBBroker broker
           
protected  Configuration config
           
protected  boolean indexNumbers
           
protected static Logger LOG
           
protected  boolean stem
           
protected  PorterStemmer stemmer
           
protected  java.util.TreeSet stoplist
           
protected  boolean termFreq
           
protected  Tokenizer tokenizer
           
protected  int trackMatches
           
 
Constructor Summary
TextSearchEngine(DBBroker broker, Configuration conf)
          Construct a new instance and configure it.
 
Method Summary
abstract  boolean close()
           
abstract  void dropIndex(Collection collection)
          Remove index entries for an entire collection.
abstract  void dropIndex(DocumentImpl doc)
          Remove all index entries for the given document.
abstract  void flush()
           
abstract  java.lang.String[] getIndexTerms(DocumentSet docs, TermMatcher matcher)
           
abstract  NodeSet getNodes(XQueryContext context, DocumentSet docs, NodeSet contextSet, TermMatcher matcher, java.lang.CharSequence startTerm)
           
 NodeSet getNodesContaining(XQueryContext context, DocumentSet docs, NodeSet contextSet, java.lang.String expr)
          For each of the given search terms and each of the documents in the document set, return a node-set of matching nodes.
abstract  NodeSet getNodesContaining(XQueryContext context, DocumentSet docs, NodeSet contextSet, java.lang.String expr, int type)
          For each of the given search terms and each of the documents in the document set, return a node-set of matching nodes.
 Tokenizer getTokenizer()
          Returns the Tokenizer used for tokenizing strings into words.
 int getTrackMatches()
           
abstract  void reindex(DocumentImpl oldDoc, StoredNode node)
          Reindex a document or node.
abstract  Occurrences[] scanIndexTerms(DocumentSet docs, NodeSet contextSet, java.lang.String start, java.lang.String end)
          Queries the fulltext index to retrieve information on indexed words contained in the index for the current collection.
 void setTrackMatches(int flags)
           
abstract  void storeAttribute(FulltextIndexSpec idx, AttrImpl text)
          Tokenize and index the given attribute node.
abstract  void storeText(FulltextIndexSpec idx, TextImpl text, boolean onetoken)
          Tokenize and index the given text node.
 
Methods inherited from class java.util.Observable
addObserver, clearChanged, countObservers, deleteObserver, deleteObservers, hasChanged, notifyObservers, notifyObservers, setChanged
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

LOG

protected static final Logger LOG

stoplist

protected java.util.TreeSet stoplist

broker

protected DBBroker broker

tokenizer

protected Tokenizer tokenizer

config

protected Configuration config

indexNumbers

protected boolean indexNumbers

stem

protected boolean stem

termFreq

protected boolean termFreq

stemmer

protected PorterStemmer stemmer

trackMatches

protected int trackMatches
Constructor Detail

TextSearchEngine

public TextSearchEngine(DBBroker broker,
                        Configuration conf)
Construct a new instance and configure it.

Parameters:
broker -
conf -
Method Detail

getTokenizer

public Tokenizer getTokenizer()
Returns the Tokenizer used for tokenizing strings into words.

Returns:

storeText

public abstract void storeText(FulltextIndexSpec idx,
                               TextImpl text,
                               boolean onetoken)
Tokenize and index the given text node.

Parameters:
idx -
text -

storeAttribute

public abstract void storeAttribute(FulltextIndexSpec idx,
                                    AttrImpl text)
Tokenize and index the given attribute node.

Parameters:
idx -
text -

flush

public abstract void flush()

close

public abstract boolean close()
                       throws DBException
Throws:
DBException

getTrackMatches

public int getTrackMatches()

setTrackMatches

public void setTrackMatches(int flags)

getNodesContaining

public NodeSet getNodesContaining(XQueryContext context,
                                  DocumentSet docs,
                                  NodeSet contextSet,
                                  java.lang.String expr)
                           throws TerminatedException
For each of the given search terms and each of the documents in the document set, return a node-set of matching nodes. This method uses MATCH_EXACT for comparing search terms.

Parameters:
expr -
Returns:
Throws:
TerminatedException

getNodesContaining

public abstract NodeSet getNodesContaining(XQueryContext context,
                                           DocumentSet docs,
                                           NodeSet contextSet,
                                           java.lang.String expr,
                                           int type)
                                    throws TerminatedException
For each of the given search terms and each of the documents in the document set, return a node-set of matching nodes. The type-argument indicates if search terms should be compared using a regular expression. Valid values are DBBroker.MATCH_EXACT or DBBroker.MATCH_REGEXP.

Parameters:
expr -
Returns:
Throws:
TerminatedException

getNodes

public abstract NodeSet getNodes(XQueryContext context,
                                 DocumentSet docs,
                                 NodeSet contextSet,
                                 TermMatcher matcher,
                                 java.lang.CharSequence startTerm)
                          throws TerminatedException
Throws:
TerminatedException

scanIndexTerms

public abstract Occurrences[] scanIndexTerms(DocumentSet docs,
                                             NodeSet contextSet,
                                             java.lang.String start,
                                             java.lang.String end)
                                      throws PermissionDeniedException
Queries the fulltext index to retrieve information on indexed words contained in the index for the current collection. Returns a list of Occurrences for all words contained in the index. If is null, all words starting with the string sequence are returned. Otherwise, the method returns all words that come after start and before end in lexical order.

Throws:
PermissionDeniedException

getIndexTerms

public abstract java.lang.String[] getIndexTerms(DocumentSet docs,
                                                 TermMatcher matcher)

dropIndex

public abstract void dropIndex(Collection collection)
Remove index entries for an entire collection.

Parameters:
collection -

dropIndex

public abstract void dropIndex(DocumentImpl doc)
Remove all index entries for the given document.

Parameters:
doc -

reindex

public abstract void reindex(DocumentImpl oldDoc,
                             StoredNode node)
Reindex a document or node. If node is null, all levels of the document tree starting with DocumentImpl.reindexRequired() will be reindexed.

Parameters:
oldDoc -
node -


<oXygen/> XML Editor provides support for editing and debugging XQuery expressions against the eXist XML Database.