python中unique Python實現決策樹

準備從頭到尾把機器學習模型擼一遍吧，在我寫好矩陣運算庫之前，應該這些**會暫時用python寫，正好再給我一些緩衝時間學學c++。

決策樹，感覺沒什麼好講的，按照周志華老師西瓜書上的演算法流程一步一步擼下來就好了。感覺需要考慮的比較多的地方在於如何選擇最優的劃分特徵上，裡面牽扯到了一些資訊理論的知識，也是最近才搞懂了一些它的意義，可能後續會發在另乙個專欄裡。

小雨姑娘的機器學習筆記zhuanlan.zhihu.com

樹的建立過程明顯是乙個遞迴的過程，雖然感覺上很清晰，但是第一次用python寫樹的結構，還是把我饒了進去。

如果是用c++寫的話，這個演算法流程應該就是乙個深度優先搜尋，然後在搜尋的過程中通過指標建立一顆樹，大概應該是這麼寫：

for (i : features)

但是python裡面怎麼實現樹，我還是真的沒有什麼經驗，看了看網上的實現方式，竟然是用字典巢狀做的：

tree = {}
for i in features:
node = {}
tree[i] = node

所以如果寫遞迴建樹的話，這個邏輯大概是這樣的。遞迴生成的子樹，最後要返回給父節點賦值：

def build_tree():
for i in features:
node = {}
tree[i] = build_tree()
return tree

其實也不難，只是一般不會在python上寫底層演算法，所以第一次寫起來不習慣而已了。

先寫一些要用到的小工具：

import

然後寫決策樹：

import numpy as np
from tools import *
class decesion_tree:
self._tree = {}
def train(self, x: np.ndarray, y: np.ndarray):
x = np.array(x)
y = np.array(y)
features = set(range(0, x.shape[-1]))
self._tree = self._build_tree(x, y, features=features)
def predict(self, x: np.ndarray)->list:
x = np.array(x)
result = 
for sample in x:
node = self._tree
feature = list(node)[0]
while true:
feature_ins = sample[feature]
if type(node[feature][feature_ins]) == dict:
node = node[feature][feature_ins]
feature = list(node)[0]
else:
break
return result
def print(self):
print(self._tree)
def _build_tree(self, x: np.ndarray, y: np.ndarray, features: set)->dict:
if np.unique(y).shape == y.shape:
return y[0]
if not features or np.unique(x).shape == x.shape:
return find_most(y)
best_feature = self._find_best_feature(x, y, features)
node = }
dividen = x[:, best_feature]
for i in np.unique(dividen):
loc = np.where(dividen == i)
if not loc:
node[best_feature][i] = find_most(y[loc])
else:
node[best_feature][i] = self._build_tree(x[loc], y[loc], features - )
return node
def _find_best_feature(self, x: np.ndarray, y: np.ndarray, features: set)->object:
best_feature = none
max_gain = -1 * np.inf
for i in features:
gain = self._information_gain(y, x[:, i])
if gain > max_gain:
max_gain = gain
best_feature = i
max_gain = -1 * np.inf
for i in features:
gain = self._gain_ratio(y, x[:, i])
if gain > max_gain:
max_gain = gain
best_feature = i
min_gini = np.inf
for i in features:
gain = self._gain_ratio(y, x[:, i])
if gain < min_gini:
min_gini = gain
best_feature = i
return best_feature
@staticmethod
def _information_gain(d: np.ndarray, a: np.ndarray)->float:
ent_d = entropy(d)
size = d.size
for i in np.unique(a):
loc = np.where(a == i)
ent_d -= d[loc].size / size * entropy(d[loc])
return ent_d
@staticmethod
def _gain_ratio(d: np.ndarray, a: np.ndarray)->float:
iv_a = 0
size = d.size
for i in np.unique(a):
loc = np.where(a == i)
iv_a -= d[loc].size / size * np.log2(d[loc].size / size)
return decesion_tree._information_gain(d, a) / iv_a
@staticmethod
def _gini_index(d: np.ndarray, a: np.ndarray)->float:
result = 0
size = d.size
for i in np.unique(a):
loc = np.where(a == i)
result += d[loc].size / size * gini(d[loc])
return result

男,家用,小,c0 男,運動,中,c0 男,運動,中,c0 男,運動,大,c0 男,運動,加大,c0 男,運動,加大,c0 女,運動,小,c0 女,運動,小,c0 女,運動,中,c0 女,豪華,大,c0 男,家用,大,c1 男,家用,加大,c1 男,家用,中,c1 男,豪華,加大,c1 女,豪華,小,c1 女,豪華,小,c1 女,豪華,中,c1 女,豪華,中,c1 女,豪華,中,c1 女,豪華,大,c1

python中unique Python實現決策樹

python中 python中的與

python中否定for 在python中否定函式

python中雙重迴圈加速Python中的雙迴圈

python中unique Python實現決策樹

python中 python中的 與

python中否定for 在python中否定函式

python中雙重迴圈 加速Python中的雙迴圈

相關推薦

python中 python中的與

python中雙重迴圈加速Python中的雙迴圈