機器學習決策樹

此文章是《machine learning in action》中決策樹章節的學習筆記，

決策樹可以使用不熟悉的資料集合，並從中提取出一系列規則，在這些機器根據資料集建立規則時，就是機器學習的過程。

缺點：可能會產生過度匹配問題

適用資料型別：數值型和標稱型

1 夏農熵

# -*- coding: utf-8 -*-
'''created on 2023年02月21日
@author: dzm
'''from math import log
defcreatedataset
():'''
資料集，
第一列 不浮出水面是否可以生存
第二列 是否有腳蹼
第三列 屬於魚類
:return: 
'''dataset = [[1, 1, 'yes'],
[1, 1, 'yes'],
[1, 0, 'no'],
[0, 1, 'no'],
[0, 1, 'no']]
labels = ['no su***cing','flippers']
#change to discrete values
return dataset, labels
defcalcshannonent
(dataset):
'''    計算夏農熵
:param dataset: 
:return: 
'''numentries = len(dataset)
labelcounts = {}
for featvec in dataset: #the the number of unique elements and their occurance
currentlabel = featvec[-1]
if currentlabel not
in labelcounts.keys(): labelcounts[currentlabel] = 0
labelcounts[currentlabel] += 1
shannonent = 0.0
for key in labelcounts:
# 計算分類的概率
prob = float(labelcounts[key])/numentries
# 計算夏農熵
shannonent -= prob * log(prob,2) #log base 2
return shannonent
if __name__ == '__main__':
mydat, labels = createdataset()
print calcshannonent(mydat)

2 劃分資料集

def
splitdataset
(dataset, axis, value):
'''    劃分資料集
:param dataset: 待劃分的資料集
:param axis: 劃分資料集的特徵
:param value: 需要返回的特徵值
:return: 
'''retdataset = 
for featvec in dataset:
if featvec[axis] == value:
reducedfeatvec = featvec[:axis]     #chop out axis used for splitting
reducedfeatvec.extend(featvec[axis+1:])
return retdataset
defchoosebestfeaturetosplit
(dataset):
'''    選擇最好的資料集劃分方式
:param dataset: 
:return: 
'''numfeatures = len(dataset[0]) - 1
#the last column is used for the labels
# 計算整個資料集的原始夏農熵
baseentropy = calcshannonent(dataset)
bestinfogain = 0.0; bestfeature = -1
for i in range(numfeatures):        #iterate over all the features
featlist = [example[i] for example in dataset]#create a list of all the examples of this feature
uniquevals = set(featlist)       #get a set of unique values
newentropy = 0.0
for value in uniquevals:
subdataset = splitdataset(dataset, i, value)
prob = len(subdataset)/float(len(dataset))
newentropy += prob * calcshannonent(subdataset)
infogain = baseentropy - newentropy     #calculate the info gain; ie reduction in entropy
if (infogain > bestinfogain):       #compare this to the best gain so far
# 講每個特徵對應的熵進行比較，選擇熵值最小的，作為特徵劃分的索引值
bestinfogain = infogain         #if better than current best, set to best
bestfeature = i
return bestfeature                      #returns an integer

3 決策樹

def
majoritycnt
(classlist):
'''    :param classlist: 
:return: 
'''classcount={}
for vote in classlist:
if vote not
in classcount.keys(): classcount[vote] = 0
classcount[vote] += 1
sortedclasscount = sorted(classcount.iteritems(), key=operator.itemgetter(1), reverse=true)
return sortedclasscount[0][0]
defcreatetree
(dataset,labels):
'''    構建決策樹
:param dataset: 資料集
:param labels: 標籤列表，包含了資料集中所有特徵的標籤
:return: 
'''classlist = [example[-1] for example in dataset]
if classlist.count(classlist[0]) == len(classlist):
return classlist[0]#stop splitting when all of the classes are equal
if len(dataset[0]) == 1: #stop splitting when there are no more features in dataset
return majoritycnt(classlist)
bestfeat = choosebestfeaturetosplit(dataset)
bestfeatlabel = labels[bestfeat]
# 儲存資料的資訊
mytree = }
del(labels[bestfeat])
featvalues = [example[bestfeat] for example in dataset]
uniquevals = set(featvalues)
for value in uniquevals:
sublabels = labels[:]       #copy all of labels, so trees don't mess up existing labels
mytree[bestfeatlabel][value] = createtree(splitdataset(dataset, bestfeat, value),sublabels)
return mytree

機器學習決策樹

一基本概念決策樹 decision tree 是一種基本的分類與回歸方法。決策樹模型呈樹形結構，在分類問題中，表示屬於特徵對例項進行分類的過程，它可以認為是if then規則的集合，也可以認為是電議在特徵空間與類空空上的條件概率分布，其主要優點是模型具有可讀性，分類速度快。決策樹的學習通常包括3...

機器學習決策樹

我覺得決策樹是機器學習所有演算法中最可愛的了沒有那麼多複雜的數學公式哈哈下圖是一棵決策樹，用來判斷西瓜是好瓜還是壞瓜決策過程中提出的每個判定問題都是都對某個屬性的測試，每個測試結果要麼推導出最終結論，要麼匯出進一步判斷的問題，在上次決策結果限定的範圍內做進一步判斷。從上圖可以看出，葉節點對應決...

機器學習決策樹

一演算法簡介決策樹一般都是自上而下來生成的，每個決策後事件即自然狀態都可能引出兩個或多個事件，導致結果的不同，把這種結構分支畫成形狀很像一棵樹的枝幹，故稱為決策樹。決策樹能夠讀取資料集合，並且決策樹很多任務都是為了資料中所蘊含的知識資訊，因此決策樹可以使用不熟悉的資料集合，並從中提取一系列規...

機器學習 決策樹

機器學習 決策樹

機器學習 決策樹

機器學習 決策樹

相關推薦

機器學習決策樹

機器學習決策樹

機器學習決策樹

機器學習決策樹