5聊天機械人文字分詞

完成停用詞的準備

完成分詞方法的封裝

[外鏈轉存失敗,源站可能有防盜煉機制,建議將儲存下來直接上傳(img-11dpugby-1613751653380)(…/images/2.1/詞典.png)]

stopwords =
set(
[i.strip(
)for i in
open
(config.stopwords_path)
.readlines()]
)

def
_cut_by_word
(sentence)
:# 對中文按照字進行處理，對英文不分為字母
sentence = re.sub(
"\s+"
," "
,sentence)
sentence = sentence.strip(
)    result =
temp =
""for word in sentence:
if word.lower(
)in letters:
temp += word.lower(
)else
:if temp !="":
#不是字母
temp =
""if word.strip(
)in filters:
#標點符號
continue
else
:#是單個字
if temp !="":
#最後的temp中包含字母
return result

lib 下建立cut_sentence.py檔案，完成分詞方法的構建

import logging
import jieba
import jieba.posseg as psg
import config
import re
import string
#關閉jieba log輸出
jieba.setloglevel(logging.info)
#載入詞典
jieba.load_userdict(config.keywords_path)
#單字分割，英文部分
letters = string.ascii_lowercase
#單字分割 去除的標點
filters=
[","
,"-"
,"."
," "
]#停用詞
stopwords =
set(
[i.strip(
)for i in
open
(config.stopwords_path)
.readlines()]
)def
cut(sentence,by_word=
false
,use_stopwords=
false
,with_sg=
false):
assert by_word!=
true
or with_sg!=
true
,"根據word切分時候無法返回詞性"
if by_word:
return _cut_by_word(sentence)
else
:        ret = psg.lcut(sentence)
if use_stopwords:
ret =
[(i.word,i.flag)
for i in ret if i.word not
in stopwords]
ifnot with_sg:
ret =
[i.word for i in ret]
return ret
def_cut_by_word
(sentence)
:# 對中文按照字進行處理，對英文不分為字母
sentence = re.sub(
"\s+"
," "
,sentence)
sentence = sentence.strip(
)    result =
temp =
""for word in sentence:
if word.lower(
)in letters:
temp += word.lower(
)else
:if temp !="":
#不是字母
temp =
""if word.strip(
)in filters:
#標點符號
continue
else
:#是單個字
if temp !="":
#最後的temp中包含字母
return result

聊天機械人

我你好！小愛同學小愛你好，很高興認識你！charset utf 8 css document 休息一會 sleep 1 獲取使用者傳送的訊息可選後端對於使用者發過來的時候是否使用可選根據傳送過來的訊息返回不同的內容 messagelist array 床前明月光，有...

聊天機械人ELIZA

1 在自然語言理解這本書的緒論中曾分析了聊天機械人eliza的設計機理，表明這是一種語言變換的技巧，而不是真正的自然語言理解午間休息時檢索了一下，找到了乙個它的連線它並不如書中所舉的例子那樣的聰明也許因為這是第乙個聊天機械人，以及它並沒有真正採取自然語言理解的技術有關。下面是筆者與...

聊天機械人總結

本文參考自己動手做聊天機械人情感分析解析 jieba分詞 logging日誌模組學習 logging模組學習 python的name和doc屬性 python兩個內建函式locals和globals join 和os.path.join 函式 python字串前面加u,r,b的含義 codec...

5聊天機械人 文字分詞

聊天機械人

聊天機械人ELIZA

聊天機械人總結

相關推薦

5聊天機械人文字分詞