python實現PMI 求詞語關聯性

2021-08-17 05:57:22 字數 3207 閱讀 9599

當時,我們需要做的工作是聚集微博中的熱點事件, 然後抽取主題詞.以」六小齡童上春晚」主題為例, 我收集了9條熱門微博,分別如下:

pmi原理

**實現

def

removeemoji

(self,sentence):

return re.sub('\[.*?\]', '', sentence)

def

extractword

(self,wordlist):

sentence = ','.join(wordlist)

words =jieba.analyse.extract_tags(sentence,5)

wordlist =

for w in words:

return wordlist

# coding=utf-8

class

pmi:

def__init__

(self, document):

self.document = document

self.pmi = {}

self.miniprobability = float(1.0) / document.__len__()

self.minitogether = float(0)/ document.__len__()

self.set_word = self.getset_word()

defcalcularprobability

(self, document, wordlist):

""" :param document:

:param wordlist:

:function : 計算單詞的document frequency

:return: document frequency

"""total = document.__len__()

number = 0

for doc in document:

if set(wordlist).issubset(doc):

number += 1

percent = float(number)/total

return percent

deftogetherprobablity

(self, document, wordlist1, wordlist2):

""" :param document:

:param wordlist1:

:param wordlist2:

:function: 計算單詞的共現概率

:return:共現概率

"""joinwordlist = wordlist1 + wordlist2

percent = self.calcularprobability(document, joinwordlist)

return percent

defgetset_word

(self):

""" :function: 得到document中的詞語詞典

:return: 詞語詞典

"""list_word =

for doc in self.document:

list_word = list_word + list(doc)

set_word =

for w in list_word:

if set_word.count(w) == 0:

return set_word

defget_dict_frq_word

(self):

""" :function: 對詞典進行剪枝,剪去出現頻率較少的單詞

:return: 剪枝後的詞典

"""dict_frq_word = {}

for i in range(0, self.set_word.__len__(), 1):

list_word=

probability = self.calcularprobability(self.document, list_word)

if probability > self.miniprobability:

dict_frq_word[self.set_word[i]] = probability

return dict_frq_word

defcalculate_nmi

(self, joinpercent, wordpercent1, wordpercent2):

""" function: 計算詞語共現的nmi值

:param joinpercent:

:param wordpercent1:

:param wordpercent2:

:return:nmi

"""return (joinpercent)/(wordpercent1*wordpercent2)

defget_pmi

(self):

""" function:返回符合閾值的pmi列表

:return:pmi列表

"""dict_pmi = {}

dict_frq_word = self.get_dict_frq_word()

print dict_frq_word

for word1 in dict_frq_word:

wordpercent1 = dict_frq_word[word1]

for word2 in dict_frq_word:

if word1 == word2:

continue

wordpercent2 = dict_frq_word[word2]

list_together=

together_probability = self.calcularprobability(self.document, list_together)

if together_probability > self.minitogether:

string = word1 + ',' + word2

dict_pmi[string] = self.calculate_nmi(together_probability, wordpercent1, wordpercent2)

return dict_pmi

用python實現詞語接龍遊戲

由於剛學python沒幾天,又是很簡單的乙個程式,沒有用到物件導向的方法 其實是還沒有學會 通過簡單的過程實現了。幾點說明 為了防止總是重複的回答,電腦先是在庫中找到全部符合條件的詞語,然後隨機選乙個回答。採用tkinter作為gui,曾在vs code下消除了每乙個警告,所以import時把每乙個...

求素數 Python實現

用filter求素數 計算素數的乙個方法是埃氏篩法,它的演算法理解起來非常簡單 首先,列出從2開始的所有自然數,構造乙個序列 2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,取序列的第乙個數2,它一定是素數,然後用2把序列的2的倍數篩掉 3,4,5,...

基於Python實現的孤立詞語音識別系統

1 任務介紹 語音識別是通往真正的人工智慧的不可缺少的技術。儘管能真正聽懂人類說話的智慧型機器任然在未來不可捉摸的迷霧之中,但我們必須先解決如何識別出人類語音中包含的自然語言資訊的問題。而數字訊號處理技術將為這一任務賦能。在本課程專案的任務之中,我們面對的是乙個簡化的語音識別場景 即孤立詞識別。2 ...