針對二分類問題的對數機率模型
以下**片內容為周志華著《機器學習》習題3.3的程式(關於二分類問題的對數機率模型)。
# 周志華,機器學習,習題3.3,對數機率分類
# 導入庫和自編函式
from functionsmyself import newton
import sympy as sp
import numpy as np
import matplotlib.pyplot as plt
# 存入訓練集
attrset = np.matrix([[0.697,0.460,1],[0.774,0.376,1],[0.634,0.264,1],[0.608,0.318,1],
[0.556,0.215,1],[0.403,0.237,1],[0.481,0.149,1],[0.437,0.211,1],
[0.666,0.091,1],[0.243,0.267,1],[0.245,0.057,1],[0.343,0.099,1],
[0.639,0.161,1],[0.657,0.198,1],[0.360,0.370,1],[0.593,0.042,1],[0.719,0.103,1]]).t
flagset = np.matrix(np.concatenate((np.ones(8),np.zeros(9)))).t
numsam = flagset.shape[0]
# 構造對數機率回歸的目標函式
x1,x2,x3,y = sp.symbols('x1 x2 x3 y')
beta,y = np.matrix([[x1],[x2],[x3]]),0*x1
for m in range(numsam):
mid = np.dot(beta.t,attrset[:,m])
y = y - flagset[m,0]*mid[0,0] + sp.log(1+sp.exp(1)**(mid[0,0]))
# 求解對數機率回歸的目標函式
fucarray = np.matrix([[x1],[x2],[x3],[y]])
errset = 1e-14
timesset = 1e2
xcurr = np.matrix([[np.random.random() for m in range(1)] for n in range(fucarray.shape[0]-1)])
betacal = newton(fucarray,errset,timesset,xcurr)
# 觀察習得模型的準確性
plt.close('all')
plt.figure(1)
indexgood,indexbad = ,
for m in range(numsam):
plt.scatter(np.array(attrset[0,indexgood]).reshape(len(indexgood),order='c'),np.array(attrset[1,indexgood]).reshape(len(indexgood),order='c'),marker='o',color='k',label='esgood')
plt.scatter(np.array(attrset[0,indexbad]).reshape(len(indexbad),order='c'),np.array(attrset[1,indexbad]).reshape(len(indexbad),order='c'),marker='o',color='r',label='esbad')
plt.xlabel('density')
plt.ylabel('sugar')
plt.legend(loc='upper left')
plt.title('exercise set')
plt.figure(2)
indexgood,indexbad = ,
for m in range(numsam):
plt.scatter(np.array(attrset[0,indexgood]).reshape(len(indexgood),order='c'),np.array(attrset[1,indexgood]).reshape(len(indexgood),order='c'),marker='o',color='k',label='esgood')
plt.scatter(np.array(attrset[0,indexbad]).reshape(len(indexbad),order='c'),np.array(attrset[1,indexbad]).reshape(len(indexbad),order='c'),marker='o',color='r',label='esbad')
plt.xlabel('density')
plt.ylabel('sugar')
plt.legend(loc='upper left')
plt.title('es result')
plt.show()
# 牛頓法函式
def newton(fucarray,errset,timesset,xcurr):
# fucarray為自變數和因變數組成的(numx+1)*1的符號矩陣,最後乙個元素為因變數,numx為自變數的個數
# errset為函式導數模值的允許誤差範圍
# timesset為牛頓法迭代的最大次數
# xcurr為牛頓法的初始點,是乙個numx*1的矩陣
import sympy
import numpy
numx = fucarray.shape[0]-1
diff1 = numpy.matrix([[sympy.diff(fucarray[numx,0],fucarray[n,0],1)] for n in range(numx)])
diff2 = numpy.matrix([[sympy.diff(diff1[n,0],fucarray[m,0],1) for m in range(numx)] for n in range(numx)])
numdiff1 = numpy.matrix([[0.0 for m in range(1)] for n in range(numx)])
numdiff2 = numpy.matrix([[0.0 for m in range(numx)] for n in range(numx)])
times = 0
while true:
for n in range(numx):
numdiff1[n,0] = diff1[n,0].subs([(fucarray[nn,0],xcurr[nn,0]) for nn in range(numx)])
for m in range(numx):
numdiff2[n,m] = diff2[n,m].subs([(fucarray[nn,0],xcurr[nn,0]) for nn in range(numx)])
if numpy.linalg.norm(numdiff1)
timesset:
break
times = times + 1
xcurr = xcurr - numpy.dot(numdiff2.i,numdiff1)
print('times = ',times)
print('xcurr = ',xcurr)
return xcurr
針對二分類問題的線性判別分析模型
針對二分類問題的線性判別分析模型 以下 片內容為周志華著 機器學習 習題3.5的程式 關於二分類問題的線性判別分析模型 周志華,機器學習,習題3.5,線性判別分類 導入庫 import numpy as np import matplotlib.pyplot as plt 存入訓練集 attrset...
二分類問題模型指標
正如下圖所示,f1的值同時受到p r的影響,單純地追求p r的提公升並沒有太大作用。在實際業務工程中,結合正負樣本比,的確是一件非常有挑戰的事。auc是roc的積分 曲線下面積 是乙個數值,一般認為越大越好,數值相對於曲線而言更容易當做調參的參照。pr曲線會面臨乙個問題,當需要獲得更高recall時...
二分類模型評估
分類演算法最常見的指標是分類準確率 accuracy 而當樣本中的分類極度不均衡時,accuracy不能說明問題 例如在100個觀測樣本中,有95個0,5個1,全部 為0,accuracy是95 已經很高了 一般我們用混淆矩陣 confusion matrix 來描述二分類的好壞,也通過此矩陣衍生出...