Lucene分析器的實現

public abstract class analyzer 
*/public int getpositionincrementgap(string fieldname)
}

string content = "...";
stringreader reader = new stringreader(content);
analyzer analyzer = new ....();
tokenstream ts = analyzer.tokenstream("",reader);
//開始分詞
token t = null;
while ((t = ts.next()) != null)

分析器由兩部分組成。一部分是分詞器，被稱tokenizer, 另一部分是過濾器，tokenfilter. 它們都繼承自tokenstream。乙個分析器往由乙個分詞器和多個過濾器組成。

public abstract class tokenizer extends tokenstream 
/** construct a token stream processing the given input. */
protected tokenizer(reader input) 
/** by default, closes the input reader. */
public void close() throws ioexception 
}

public abstract class tokenfilter extends tokenstream 
/** close the input tokenstream. */
public void close() throws ioexception 
}

standardanalyer的tokenstream方法，除了使用statandtokenizer進行分詞外，還使用了3個filtter:

public tokenstream tokenstream(string fieldname, reader reader)

stopset在構造standardanalyer時指定，無構造參加時，使用預設的stopanalyzer.english_stop_words提供的過濾詞。

Lucene系列分析器

搜尋的基礎是對文字資訊進行分析，lucene的分析工具在org.apache.lucene.analysis包中。分析器負責對文字進行分詞語言處理得到詞條，建索引和搜尋的時候都需要用到分析器，兩者應當是同乙個，否則沒法很好的匹配。lucene的分析器往往包括乙個分詞器 tokenizer 和多個過...

lucene 同義詞分析器

這個分析器用synonymfilter過濾器對standardanalyzer類進行封裝,當向這個過濾器輸入各個項時,會對這些項進行緩衝,並使用棧儲存這些項的同義詞 public class synonymfilter extends tokenfilter publicstatic final s...

C 實現詞法分析器

written by zzg date 11,25,2017 include include include using namespace std string keyword 15 char separater 8 分隔符 char operator 8 運算子 char filter 4 過濾...

Lucene分析器的實現

Lucene系列 分析器

lucene 同義詞分析器

C 實現詞法分析器

相關推薦

Lucene系列分析器