1,以下是**部分
import os,sys
import json
class
trietree:
def__init__
(self,is_debug=1,is_sentence=0):
self.tree = none
self.tree = {}
self.is_debug = is_debug
self.is_sentence = is_sentence
self.prefix_list =
defaddfromfile
(self,filepath):
with open(filepath) as f:
for line in f:
line_list = line.strip().strip("#").split("#")
main_word = line_list[0].strip().split()
ifnot self.is_sentence:
sub_word_list = [
u.replace(" ","") for u in line_list
]else:
sub_word_list = line_list
for i,w in enumerate(main_word):
if i == 0:
target_dict = self.tree
else:
target_dict = target_dict[main_word[i-1]]
if w not
in target_dict:
target_dict[w] = {}
target_dict[w]["##cnt"] = 1
target_dict[w]["##terminal"] =
target_dict[w]["##wordtag"] = 0
else:
target_dict[w]["##cnt"] += 1
if i== len(main_word)-1:
target_dict[w]["##terminal"].extend(sub_word_list)
target_dict[w]["##wordtag"] = 1
if self.is_debug:
context = json.dumps(self.tree,indent=2,ensure_ascii=false)
print>>file("./debug.json","w"),context
defsearchprefix
(self,prefix_string):
self.prefix_list =
target_dict = self.tree
ifnot self.tree:
return self.prefix_list
if self.is_sentence:
prefix_string = prefix_string.strip().split(" ")
for i,w in enumerate(prefix_string):
if w not
in target_dict:
return self.prefix_list
else:
target_dict = target_dict[w]
defdeepsearch
(target_dict):
if len(target_dict.keys())==3:
self.prefix_list.extend(target_dict["##terminal"])
return
else:
self.prefix_list.extend(target_dict["##terminal"])
for k in target_dict.keys():
if k not
in ["##terminal","##cnt","##wordtag"]:
deepsearch(target_dict[k])
deepsearch(target_dict)
return self.prefix_list
if __name__ == "__main__":
trie = trietree(is_debug=1,is_sentence=1)
trie.addfromfile(sys.argv[1])
while
1: raw=raw_input("please input:")
print trie.searchprefix(raw)
2,以下是測試用例部分,將下面的英文句子貼上到乙個檔案名字是sent.d中;
hi, my name is steve.#
it』s nice to meet you.#
it』s a pleasure to meet you i』m jack.#
what do you do for a living.#
i work at a restaurant.#
i work at a bank.#
i work in a software company.#
i』m a dentist.#
what is your name.#
what was that again.#
excuse me.#
pardon me.#
are you ready?#
are you free now?#
are you mr. murthy?#
are you angry with me?#
are you afraid of them?#
are you tired?#
are you married?#
are you employed?#
are you interested in that?#
are you awake?#
are you aware of that?#
are you a relative of mr. mohan?#
are you not well?#
are they your relatives?#
are they from abroad?#
are the shops open?#
are you satisfied now?#
are you joking?#
3,測試過程
在linux shell中執行:
python trietree.py sent.d
即可輸入乙個完整的單詞字首進行查詢了!
** 這裡你可能會有疑問,這個演算法只能是按照字首搜尋,即
按照2裡面的例子來看,輸入are,只能得到一are 開頭的句子,輸入are you 只能得到以are you 開頭的句子,如果我想知道 所有含有單詞shops的句子呢?該如何處理,這個時候 「字尾樹」就會發揮作用了,名字為字尾樹,實則不然,其實是把所有句子的字尾單元都壓入到乙個字首樹中,例如
are you a lucky dog?
這個句子的所有的字尾就是
are you a lucky dog?
you lucky dog?
lucky dog?
dog?
把每個句子的所有的字尾都壓入到字首樹中,那麼是不是就會很方便的查詢到含有某個單詞的所有句子了呢?
用python實現自動拍照專案
當下,python是熱門語言之一,python可以實現各種各樣的功能,從而在現實生活中幫助我們。專案全部 import cv2 import time def snapshotct camera idx 1 camera idx的作用是選擇攝像頭。如果為0則使用內建攝像頭,比如筆記本的攝像頭,用1或...
用python實現的NYOJ自動簽到程式
程式簡介 使用說明 import requests from bs4 import beautifulsoup deflogin check response 抓取獲得登入結果 soup beautifulsoup response.text,html.parser lists soup.find ...
用python的OCR實現自動拍照搜題
學以致用系列 而且!以上都是可以用 寫出來的。因為用python實現的,部分主要是需要搭建乙個python中ocr的環境 ocr安裝在這裡 每道題的答題時間是三十秒,上面三步完成基本是夠的。為了答題的命中率我也是蠻拼的了。1.截圖的題幹 2.文字識別出來的結果 text pytesseract.im...