python的全文檢索庫Whoosh使用示例

pip install whoosh

首先，我有乙個xiaoshuo資料夾，裝了幾部**

直接上**：

首先是建立索引的檔案

from whoosh.filedb.filestore import filestorage
from whoosh.fields import *
from jieba.analyse import chineseanalyzer
import os
analyzer = chineseanalyzer()
schema = schema(
title=text(stored=true),
content=text(stored=true, analyzer=analyzer)
)storage = filestorage('./xiaoshuoindex')
if not os.path.exists('./xiaoshuoindex'):
os.mkdir('./xiaoshuoindex')
ix = storage.create_index(schema)
else:
ix = storage.open_index()
writer = ix.writer()
filelist = os.listdir('./xiaoshuo')
for file in filelist:
content = open('./xiaoshuo/'+file, encoding='utf-8').readlines()
# content是乙個列表，必須轉成字串，才能正常使用
# writer.add_document(title=file, content=content)
writer.add_document(title=file, content=''.join(content))
print(file, '索引完成')
writer.commit()
print('索引全部完成')

索引建立完成之後，會生成乙個資料夾

然後是做個測試

from whoosh.qparser import queryparser
from whoosh.filedb.filestore import filestorage
# 建立索引儲存物件
storage = filestorage('./xiaoshuoindex')
# 開啟索引檔案，獲取索引物件
ix = storage.open_index()
# 獲取搜尋物件searcher，使用者進行搜尋的
# for item in ix.reader().all_terms():
#     print(item)
with ix.searcher() as searcher:
# 建立query物件，被用來搜尋的
query = queryparser('content', ix.schema).parse('劍眉')
# 使用搜尋物件的搜尋方法來完成檢索
# search(query, limit=none)
# limit限制搜尋結果的條數，預設為10個，指定為none則顯示所有
results = searcher.search(query, limit=none)
for res in results:
print(res['title'])

老規矩，執行看下結果

開啟這個**，搜尋一下

把『』劍眉『』換成『遊戲』

有點多，隨便找幾個看看

全文檢索python

全文檢索全文檢索不同於特定欄位的模糊查詢，使用全文檢索的效率更高，並且能夠對於中文進行分詞處理 jieba 一款免費的中文分詞包，如果覺得不好用可以使用一些收費產品 django中的全文檢索安裝配置 1.依賴包安裝 pip install django haystack pip install w...

發布全文檢索類庫外包

因時間緊張，現將全文檢索類庫外包，需求如下，有意者可詳細溝通。全文檢索類庫需求說明作業系統環境 windows 2003 iis 6 ide開發環境 vs 2008 c 3.5 lucene.net 或者其他技術要求符合高併發需求，可滿足同時300以上個搜尋的請求記憶體，io，通訊傳輸量讀...

全文檢索的原理

參考全文檢索歸結為兩個過程 1 建立索引2 索引搜尋先關注幾個問題如何建立索引？索引中存放的是神馬東西？如果通過索引進行搜尋？然後關注幾個重要的概念反向索引倒排表倒排索引倒排索引檔案停詞權重反向索引儲存這種對映資訊的索引稱為反向索引 solr lucene採用反向索引就是從關鍵...

python的全文檢索庫Whoosh使用示例

全文檢索python

發布全文檢索類庫外包

全文檢索的原理

相關推薦