|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectjava.util.Observable
org.exist.storage.TextSearchEngine
org.exist.storage.NativeTextEngine
This class is responsible for fulltext-indexing. Text-nodes are handed over
to this class to be fulltext-indexed. Method storeText() is called by
RelationalBroker whenever it finds a TextNode. Method getNodeIDsContaining()
is used by the XPath-engine to process queries where a fulltext-operator is
involved. The class keeps two database tables: table dbTokens stores the words
found with their unique id. Table invertedIndex contains the word occurrences for
every word-id per document.
TODO: store node type (attribute or text) with each entry
| Field Summary | |
static byte |
ATTRIBUTE_SECTION
|
protected BFile |
dbTokens
The datastore for this token index |
protected org.exist.storage.NativeTextEngine.InvertedIndex |
invertedIndex
|
static int |
MAX_TOKEN_LENGTH
Length limit for the tokens |
static byte |
TEXT_SECTION
|
| Fields inherited from class org.exist.storage.TextSearchEngine |
broker, config, indexNumbers, LOG, stem, stemmer, stoplist, termFreq, tokenizer, trackMatches |
| Constructor Summary | |
NativeTextEngine(DBBroker broker,
Configuration config,
BFile db)
|
|
| Method Summary | |
boolean |
close()
|
static boolean |
containsWildcards(java.lang.String str)
Checks if the given string could be a regular expression. |
void |
dropIndex(Collection collection)
Drop all index entries for the given collection. |
void |
dropIndex(DocumentImpl document)
Drop all index entries for the given document. |
void |
endElement(int xpathType,
ElementImpl node,
java.lang.String content)
store and index given element (called storeElement before) |
void |
flush()
writes the pending items, for the current document's collection |
java.lang.String[] |
getIndexTerms(DocumentSet docs,
TermMatcher matcher)
|
NodeSet |
getNodes(XQueryContext context,
DocumentSet docs,
NodeSet contextSet,
TermMatcher matcher,
java.lang.CharSequence startTerm)
|
NodeSet |
getNodesContaining(XQueryContext context,
DocumentSet docs,
NodeSet contextSet,
java.lang.String expr,
int type)
For each of the given search terms and each of the documents in the document set, return a node-set of matching nodes. |
NodeSet |
getNodesExact(XQueryContext context,
DocumentSet docs,
NodeSet contextSet,
java.lang.String expr)
Get all nodes whose content exactly matches the give expression. |
int |
getTrackMatches()
|
void |
printStatistics()
|
void |
reindex(DocumentImpl document,
StoredNode node)
Reindexes all pending items for the specified document. |
void |
remove()
remove all pending modifications, for the current document. |
void |
removeElement(ElementImpl node,
NodePath currentPath,
java.lang.String content)
Mark given Element for removal; added entries are written to the list of pending entries. |
Occurrences[] |
scanIndexTerms(DocumentSet docs,
NodeSet contextSet,
java.lang.String start,
java.lang.String end)
Queries the fulltext index to retrieve information on indexed words contained in the index for the current collection. |
void |
setDocument(DocumentImpl document)
set the current document; generally called before calling an operation |
void |
setTrackMatches(int flags)
|
void |
startElement(ElementImpl impl,
NodePath currentPath,
boolean index)
corresponds to SAX function of the same name |
static boolean |
startsWithWildcard(java.lang.String str)
|
void |
storeAttribute(AttrImpl node,
NodePath currentPath,
boolean fullTextIndexSwitch)
store and index given attribute |
void |
storeAttribute(FulltextIndexSpec indexSpec,
AttrImpl attr)
Indexes the tokens contained in an attribute. |
void |
storeAttribute(RangeIndexSpec spec,
AttrImpl node)
|
void |
storeText(FulltextIndexSpec indexSpec,
TextImpl text,
boolean noTokenizing)
Indexes the tokens contained in a text node. |
void |
storeText(TextImpl node,
NodePath currentPath,
boolean fullTextIndexSwitch)
store and index given text node |
void |
sync()
triggers a cache sync, i.e. |
java.lang.String |
toString()
|
| Methods inherited from class org.exist.storage.TextSearchEngine |
getNodesContaining, getTokenizer |
| Methods inherited from class java.util.Observable |
addObserver, clearChanged, countObservers, deleteObserver, deleteObservers, hasChanged, notifyObservers, notifyObservers, setChanged |
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
| Field Detail |
public static final byte TEXT_SECTION
public static final byte ATTRIBUTE_SECTION
public static final int MAX_TOKEN_LENGTH
protected BFile dbTokens
protected org.exist.storage.NativeTextEngine.InvertedIndex invertedIndex
| Constructor Detail |
public NativeTextEngine(DBBroker broker,
Configuration config,
BFile db)
| Method Detail |
public static final boolean containsWildcards(java.lang.String str)
str - The stringpublic static final boolean startsWithWildcard(java.lang.String str)
public int getTrackMatches()
getTrackMatches in class TextSearchEnginepublic void setTrackMatches(int flags)
setTrackMatches in class TextSearchEnginepublic void setDocument(DocumentImpl document)
ContentLoadingObserver
setDocument in interface ContentLoadingObserver
public void storeAttribute(FulltextIndexSpec indexSpec,
AttrImpl attr)
storeAttribute in class TextSearchEngineattr - The attribute to be indexedindexSpec -
public void storeText(FulltextIndexSpec indexSpec,
TextImpl text,
boolean noTokenizing)
storeText in class TextSearchEngineindexSpec - The index configurationtext - The text node to be indexednoTokenizing - if true, given text is indexed as a single token
if false, it is tokenized before being indexed
public void storeAttribute(RangeIndexSpec spec,
AttrImpl node)
public void storeAttribute(AttrImpl node,
NodePath currentPath,
boolean fullTextIndexSwitch)
ContentLoadingObserver
storeAttribute in interface ContentLoadingObserver
public void storeText(TextImpl node,
NodePath currentPath,
boolean fullTextIndexSwitch)
ContentLoadingObserver
storeText in interface ContentLoadingObserver
public void startElement(ElementImpl impl,
NodePath currentPath,
boolean index)
ContentLoadingObserver
startElement in interface ContentLoadingObserver
public void endElement(int xpathType,
ElementImpl node,
java.lang.String content)
ContentLoadingObserver
endElement in interface ContentLoadingObserver
public void removeElement(ElementImpl node,
NodePath currentPath,
java.lang.String content)
ContentLoadingObserverContentLoadingObserver.flush() is called later to flush all pending entries.
removeElement in interface ContentLoadingObserverpublic void sync()
ContentLoadingObserver
sync in interface ContentLoadingObserverpublic void flush()
ContentLoadingObserver
flush in interface ContentLoadingObserverflush in class TextSearchEngine
public void reindex(DocumentImpl document,
StoredNode node)
ContentLoadingObserver#addNode(QName, NodeProxy), #storeElement(int, ElementImpl, String),
and {@link #storeAttribute(RangeIndexSpec, AttrImpl)}. Method reindex then scans this
list and updates the items in the index to reflect the reindexed document.
reindex in interface ContentLoadingObserverreindex in class TextSearchEnginedocument - node - public void remove()
ContentLoadingObserver
remove in interface ContentLoadingObserverpublic void dropIndex(Collection collection)
ContentLoadingObserver
dropIndex in interface ContentLoadingObserverdropIndex in class TextSearchEnginecollection - public void dropIndex(DocumentImpl document)
ContentLoadingObserver
dropIndex in interface ContentLoadingObserverdropIndex in class TextSearchEnginedocument -
public NodeSet getNodesContaining(XQueryContext context,
DocumentSet docs,
NodeSet contextSet,
java.lang.String expr,
int type)
throws TerminatedException
TextSearchEngine
getNodesContaining in class TextSearchEngineexpr -
TerminatedException
public NodeSet getNodesExact(XQueryContext context,
DocumentSet docs,
NodeSet contextSet,
java.lang.String expr)
throws TerminatedException
context - docs - contextSet - expr -
TerminatedException
public NodeSet getNodes(XQueryContext context,
DocumentSet docs,
NodeSet contextSet,
TermMatcher matcher,
java.lang.CharSequence startTerm)
throws TerminatedException
getNodes in class TextSearchEngineTerminatedException
public java.lang.String[] getIndexTerms(DocumentSet docs,
TermMatcher matcher)
getIndexTerms in class TextSearchEngine
public Occurrences[] scanIndexTerms(DocumentSet docs,
NodeSet contextSet,
java.lang.String start,
java.lang.String end)
throws PermissionDeniedException
TextSearchEngineOccurrences for all
words contained in the index. If is null, all words starting with
the string sequence are returned. Otherwise, the method
returns all words that come after start and before end in lexical order.
scanIndexTerms in class TextSearchEnginePermissionDeniedException
public boolean close()
throws DBException
close in class TextSearchEngineDBExceptionpublic void printStatistics()
public java.lang.String toString()
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||