Lucene 中相關度排序

lucene 中的相似度排序主要是在org.apache.lucene.search 包下的 similarity類中定義的，其排序演算法如下：

score(q,d) =

σ ( tf(t in d) * idf(t)^2 * getboost(t in q) * getboost(t.field in d) * lengthnorm(t.field in d) ) * coord(q,d) * querynorm(sumofsqaredweights)

t in q

sumofsqaredweights =

σ ( idf(t) * getboost(t in q) )^2

t in q

similarity是乙個抽象類，預設的相似度排序演算法是在defaultsimilarity類中實現，

tf為 math.sqrt(freq)，其中freq為標引項在d 中的頻度

idf為 (math.log(numdocs/(double)(docfreq+1)) + 1.0)

lengthnorm

為長度標準化因子

(1.0 / math.sqrt(numterms)) 其中numterms為標引項

coord(q,d)

查詢詞q在在文件d中命中的個數越多，則該因子也就越大

publicfloatcoord(intoverlap,intmaxoverlap)

其中overlap為命中的個數，maxoverlap為查詢詞的個數

querynorm

這一項不影響排序

Lucene相關度排序的調整

看sort的預設建構函式，相關度就是sortfield.field score和sortfield.field doc的組合。sorts by computed relevance.this is the same sort criteria as calling without a sort cr...

Lucene相關度排序學習筆記

lucene對查詢關鍵字和索引文件的相關度進行打分，得分高的就排在前邊。1.1 如何打分 lucene是在使用者進行檢索時實時根據搜尋的關鍵字計算出來的，分兩步 1 計算出詞 term 的權重。2 根據詞的權重值，計算文件相關度得分。明確索引的最小單位是乙個term 索引詞典的乙個詞搜尋也是從te...

lucene 3（相關度排序）

相關度排序這個東西顧名思義，在上文中我們講到了乙個打分的問題，就是說的在查詢關鍵字匹配的時候，相識度越高的就會打分越高，就會越靠前。打分的兩個步驟 1.根據詞計算詞的權重。2.根據詞的權重打分。詞的權重意思就是詞的重要性，而且詞就是我們上文講到的term，而影響詞的權重的有兩個東西 tf 詞在該文...

Lucene 中相關度排序

Lucene相關度排序的調整

Lucene相關度排序學習筆記

lucene 3（相關度排序）

相關推薦