用python實現的可以自動補全的字首樹

2021-07-25 13:57:34 字數 3781 閱讀 4201

1,以下是**部分

import os,sys

import json

class

trietree:

def__init__

(self,is_debug=1,is_sentence=0):

self.tree = none

self.tree = {}

self.is_debug = is_debug

self.is_sentence = is_sentence

self.prefix_list =

defaddfromfile

(self,filepath):

with open(filepath) as f:

for line in f:

line_list = line.strip().strip("#").split("#")

main_word = line_list[0].strip().split()

ifnot self.is_sentence:

sub_word_list = [

u.replace(" ","") for u in line_list

]else:

sub_word_list = line_list

for i,w in enumerate(main_word):

if i == 0:

target_dict = self.tree

else:

target_dict = target_dict[main_word[i-1]]

if w not

in target_dict:

target_dict[w] = {}

target_dict[w]["##cnt"] = 1

target_dict[w]["##terminal"] =

target_dict[w]["##wordtag"] = 0

else:

target_dict[w]["##cnt"] += 1

if i== len(main_word)-1:

target_dict[w]["##terminal"].extend(sub_word_list)

target_dict[w]["##wordtag"] = 1

if self.is_debug:

context = json.dumps(self.tree,indent=2,ensure_ascii=false)

print>>file("./debug.json","w"),context

defsearchprefix

(self,prefix_string):

self.prefix_list =

target_dict = self.tree

ifnot self.tree:

return self.prefix_list

if self.is_sentence:

prefix_string = prefix_string.strip().split(" ")

for i,w in enumerate(prefix_string):

if w not

in target_dict:

return self.prefix_list

else:

target_dict = target_dict[w]

defdeepsearch

(target_dict):

if len(target_dict.keys())==3:

self.prefix_list.extend(target_dict["##terminal"])

return

else:

self.prefix_list.extend(target_dict["##terminal"])

for k in target_dict.keys():

if k not

in ["##terminal","##cnt","##wordtag"]:

deepsearch(target_dict[k])

deepsearch(target_dict)

return self.prefix_list

if __name__ == "__main__":

trie = trietree(is_debug=1,is_sentence=1)

trie.addfromfile(sys.argv[1])

while

1: raw=raw_input("please input:")

print trie.searchprefix(raw)

2,以下是測試用例部分,將下面的英文句子貼上到乙個檔案名字是sent.d中;

hi, my name is steve.#

it』s nice to meet you.#

it』s a pleasure to meet you i』m jack.#

what do you do for a living.#

i work at a restaurant.#

i work at a bank.#

i work in a software company.#

i』m a dentist.#

what is your name.#

what was that again.#

excuse me.#

pardon me.#

are you ready?#

are you free now?#

are you mr. murthy?#

are you angry with me?#

are you afraid of them?#

are you tired?#

are you married?#

are you employed?#

are you interested in that?#

are you awake?#

are you aware of that?#

are you a relative of mr. mohan?#

are you not well?#

are they your relatives?#

are they from abroad?#

are the shops open?#

are you satisfied now?#

are you joking?#

3,測試過程

在linux shell中執行:

python trietree.py sent.d

即可輸入乙個完整的單詞字首進行查詢了!

** 這裡你可能會有疑問,這個演算法只能是按照字首搜尋,即

按照2裡面的例子來看,輸入are,只能得到一are 開頭的句子,輸入are you 只能得到以are you 開頭的句子,如果我想知道 所有含有單詞shops的句子呢?該如何處理,這個時候 「字尾樹」就會發揮作用了,名字為字尾樹,實則不然,其實是把所有句子的字尾單元都壓入到乙個字首樹中,例如

are you a lucky dog?

這個句子的所有的字尾就是

are you a lucky dog?

you lucky dog?

lucky dog?

dog?

把每個句子的所有的字尾都壓入到字首樹中,那麼是不是就會很方便的查詢到含有某個單詞的所有句子了呢?

用python實現自動拍照專案

當下,python是熱門語言之一,python可以實現各種各樣的功能,從而在現實生活中幫助我們。專案全部 import cv2 import time def snapshotct camera idx 1 camera idx的作用是選擇攝像頭。如果為0則使用內建攝像頭,比如筆記本的攝像頭,用1或...

用python實現的NYOJ自動簽到程式

程式簡介 使用說明 import requests from bs4 import beautifulsoup deflogin check response 抓取獲得登入結果 soup beautifulsoup response.text,html.parser lists soup.find ...

用python的OCR實現自動拍照搜題

學以致用系列 而且!以上都是可以用 寫出來的。因為用python實現的,部分主要是需要搭建乙個python中ocr的環境 ocr安裝在這裡 每道題的答題時間是三十秒,上面三步完成基本是夠的。為了答題的命中率我也是蠻拼的了。1.截圖的題幹 2.文字識別出來的結果 text pytesseract.im...