近期,由於專案需求,需要用到貝葉斯定理及其相關知識,於是又系統的學習了一下,順便做一下筆記。
**(非常詳細的注釋):
#-*- coding:utf-8 -*-
import copy #
用於深度拷貝,適用於複雜的資料結構
#複雜的資料結構看不懂,一定要在紙上畫圖,畫出來就一目了然了
class
native_bayes:
def__init__
(self, character_vec_, class_vec_):
"""# 縮排必須正確,不然會報錯
建構函式,傳入的引數請看最底下的函式呼叫
character_vec_:[("character_a",["a1","a2","a3"]), ("character_b",["b1","b2","b3"])] 是乙個巢狀資料結構,最外層是乙個列表,內層是元組,元組裡還有列表
class_vec_:["class_x", "class_y"]
"""character_condition_per = {} #
建立乙個資料結構,建議在紙上畫出結構圖
#這是乙個巢狀的三層字典,用於統計計數
for character_name in
character_vec_:
character_condition_per[character_name[0]] ={}
for character_value in character_name[1]:
character_condition_per[character_name[0]][character_value] =
self.class_set = {} #
記錄該類別下各個特徵值的條件概率
#這是乙個兩層字典,內嵌乙個三層字典
for class_name in
class_vec_:
self.class_set[class_name] =
#print("init", character_vec_, self.class_set) #for debug
deflearn(self, sample_):
"""learn是訓練函式,傳入的引數為sample_:
[, #特徵向量
'class_name' : 'class_x' #類別名稱}]
"""for each_sample in
sample_:
character_vec_ = each_sample['
character']
class_name = each_sample['
class_name']
data_for_class =self.class_set[class_name]
data_for_class[
'num
'] += 1
#各個特質值樣本數量加1
for character_name in character_vec_: #
預設迭代的字典的鍵
character_value =character_vec_[character_name]
data_for_character = data_for_class['
character_condition_per
'][character_name][character_value]
data_for_character[
'num
'] += 1
#數量計算完畢, 計算最終的概率值
sample_num =len(sample_)
for each_sample in
sample_:
character_vec_ = each_sample['
character']
class_name = each_sample['
class_name']
data_for_class =self.class_set[class_name]
#計算類別的先驗概率
data_for_class['
class_per
'] = float(data_for_class['
num'])/sample_num
#各個特質值的條件概率
for character_name in
character_vec_:
character_value =character_vec_[character_name]
data_for_character = data_for_class['
character_condition_per
'][character_name][character_value]
data_for_character[
'condition_per
'] = float(data_for_character['
num'] / data_for_class['
num'
])
#from pprint import pprint
#pprint(self.class_set) #for debug
defclassify(self, input_):
"""分類函式:輸入引數input_:
"""best_class = ''
max_per = 0.0
for class_name in
self.class_set:
class_data =self.class_set[class_name]
per = class_data['
class_per']
#計算各個特徵值條件概率的乘積
for character_name in
input_:
character_per_data = class_data['
character_condition_per
'][character_name]
per = per * character_per_data[input_[character_name]]['
condition_per']
(class_name, per)
if per >=max_per:
best_class =class_name
return
best_class
#命名規則:函式引數後面加_,正常的則不加,非常容易區分 #臺頭
character_vec = [("
character_a
",["
a1","
a2","
a3"]),("
character_b
",["
b1","
b2","b3"
])]class_vec = ["
class_x
","class_y"]
bayes = native_bayes(character_vec, class_vec) #
建立物件
sample = [ #
建立訓練集
, #特徵向量
'class_name
' : '
class_x'#
類別名稱
}, , #
特徵向量
'class_name
' : '
class_x'#
類別名稱
}, , #
特徵向量
'class_name
' : '
class_x'#
類別名稱
}, , #
特徵向量
'class_name
' : '
class_x'#
類別名稱
}, , #
特徵向量
'class_name
' : '
class_y'#
類別名稱
}, , #
特徵向量
'class_name
' : '
class_y'#
類別名稱
}, , #
特徵向量
'class_name
' : '
class_y'#
類別名稱
}, , #
特徵向量
'class_name
' : '
class_y'#
類別名稱
},
]input_data =
bayes.learn(sample) #學習
print(bayes.classify(input_data)) #
測試
樸素貝葉斯分類
1 貝葉斯分類是一類分類演算法的總稱,這類演算法均以貝葉斯定理為基礎,故統稱為貝葉斯分類。2 樸素貝葉斯的思想基礎是這樣的 對於給出的待分類項,求解在此項出現的條件下各個類別出現的概率,哪個最大,就認為此待分類項屬於哪個類別。通俗來說,就好比這麼個道理,你在街上看到乙個黑人,我問你你猜這哥們 來的,...
樸素貝葉斯分類
摘自寫在公司內部的wiki 要解決的問題 表中增加欄位classification,有四個取值 0 初始值,未分類 1 positive 2 normal 99 negative review submit前,由樸素貝葉斯分類器決定該條review的flag屬於negative還是positive ...
分類 樸素貝葉斯
原始的貝葉斯公式為 p b a p a b p a p a b p b p a 1 在分類問題中,y為類別,x為樣本特徵,則已知待 的樣本特徵 x 它為類別yi 的概率為 p yi x p x yi p y i p x p yi jp xj y i p x 2 p yi 類別為y i的樣本 數總樣本...