吳恩達機器學習正則化邏輯回歸習題

在本部分練習中，將實現logistic回歸的正則化，以**來自製造工廠的微晶元是否通過質量保證。

假設你是工廠的產品經理，在兩次不同的測試中獲得了某些微晶元的測試結果。從這兩次測試中，你想確定應該接受還是拒絕微晶元。為了幫助你做出決定，你擁有過去微晶元測試結果的資料集，可以從中建立logistic回歸模型。

**：

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.metrics import classification_report # 評價報告
df = pd.read_csv('ex2data2.txt',names=['test1','test2','accepted'])
# print(data.head())
# sns.set(context='notebook',style='ticks',font_scale=1.5)
# sns.lmplot('test1','test2',hue='accepted',data=data,
#            height=6,
#            fit_reg=false,
#            scatter_kws=
#            )
# plt.title('regularized logistic regression')
# plt.show()
# 特徵對映,目的是將原始資料對映為新的特徵組合,以匹配更複雜的假設函式
#     feature_data = {}
#     for i in np.arange(max_power + 1):
#         for p in np.arange(i + 1):
#             feature_data["f{}{}".format(i - p, p)] = np.power(x1, i - p) * np.power(x2, p)
#     return pd.dataframe(feature_data)
# x1 = data['test1'].values
# x2 = data['test2'].values
data = {}".format(i - p, p): np.power(x, i - p) * np.power(y, p)
for i in range(power + 1)
for p in range(i + 1)
}if as_ndarray:
# return pd.dataframe(data).as_matrix()
return pd.dataframe(data).values
# return np.array(pd.dataframe(data))
else:
return pd.dataframe(data)
x1 = np.array(df.test1)
x2 = np.array(df.test2)
# print(data.shape)
# print(data.head())
# 正則化代價函式 data:(118 * 28)
theta = np.zeros(data.shape[1])
# print(x.shape)
def get_y(df):
# data.iloc[:,-1]是指data的最後一列
return np.array(df.iloc[:,-1])
# return df.iloc[:, -1]
y = get_y(df)
# # print(y.shape)
def sigmoid(z):
return 1 / (1+np.exp(-z))
def cost(theta,x,y):
return np.mean(-y * np.log(sigmoid(x.dot(theta)))-(1-y) * np.log(1-sigmoid(x.dot(theta))))
# return np.mean(-y * np.log(sigmoid(x @ theta)) - (1-y) * np.log(1-sigmoid(x @ theta)))
def regularized_cost(theta,x,y,l=1):
theta_j1_to_n = theta[1:]
regularized_term = (l / 2 * len(x)) * np.power(theta_j1_to_n,2).sum()
return cost(theta,x,y) + regularized_term
# print(regularized_cost(theta,x,y))
def gradient(theta,x,y):
return (1/len(x)) * np.dot(x.t,(sigmoid(x.dot(theta)) - y))
def regularized_gradient(theta,x,y,l=1):
theta_j1_to_n = theta[1:]
regularized_theta = (l/len(x)) * theta_j1_to_n
regularized_term = np.concatenate([np.array([0]),regularized_theta])
return gradient(theta,x,y) + regularized_term
# print(regularized_gradient(theta,x,y))
import scipy.optimize as opt
# print('init cost = {}'.format(regularized_cost(theta,x,y)))
df = pd.read_csv('ex2data2.txt',names=['test1','test2','accepted'])
x1 = np.array(df.test1)
x2 = np.array(df.test2)
y = get_y(df)
theta = np.zeros(x.shape[1])
res = opt.minimize(fun=regularized_cost,
x0=theta,
args=(x,y),
# method='newton-cg',
method='tnc',
jac=regularized_gradient)
final_theta = res.x
print(res)
# **
final_theta = res.x
def predict(x,theta):
prob = sigmoid(x.dot(theta))
return (prob >= 0.5).astype(int)
y_pred = predict(x,theta)
print(classification_report(y,y_pred))
""" 決策邊界未解決
# 畫出決策邊界
def draw_boundary(power, l):
density = 1000
threshhold = 2 * 10**-3
x, y = find_decision_boundary(density, power, final_theta, threshhold)
df = pd.read_csv('ex2data2.txt', names=['test1', 'test2', 'accepted'])
sns.lmplot('test1', 'test2', hue='accepted', data=df, size=6, fit_reg=false, scatter_kws=)
# plt.scatter(x, y, corlor='r', s=10)
plt.scatter(x, y, s=10)
plt.title('decision boundary')
plt.show()
df = pd.read_csv('ex2data2.txt', names=['test1', 'test2', 'accepted'])
x1 = np.array(df.test1)
x2 = np.array(df.test2)
y = get_y(df)
theta = np.zeros(x.shape[1])
res = opt.minimize(fun=regularized_cost,
x0=theta,
args=(x, y, l),
method='tnc',
jac=regularized_gradient)
final_theta = res.x
return final_theta
def find_decision_boundary(density, power, theta, threshhold):
t1 = np.linspace(-1, 1.5, density)  #1000個樣本
t2 = np.linspace(-1, 1.5, density)
cordinates = [(x, y) for x in t1 for y in t2]
x_cord, y_cord = zip(*cordinates)
return decision.f10, decision.f01
draw_boundary(6,1)
"""

吳恩達機器學習（正則化）

圖1 是乙個線性模型，欠擬合或者叫做高偏差，不能很好地適應我們的訓練集我們看看這些資料，很明顯，隨著房子面積增大，住房的變化趨於穩定或者說越往右越平緩。因此線性回歸並沒有很好擬合訓練資料。圖2 恰當合適的擬合了資料圖3 完美的擬合了訓練資料，稱之為過擬合或者叫做高方差，過於強調擬合原始資料，而...

吳恩達機器學習作業邏輯回歸模型

logistic regression 目的建立乙個邏輯回歸模型用來乙個是否能夠被大學錄取問題描述假如你是乙個administrator，現要根據學生的兩次成績來決定他們是否有資格被錄取。早先被錄取學生的資料作為training set。每乙個training sample 每個學生的兩個考...

吳恩達機器學習筆記（4 正則化）

到目前為止，我們已經學習了兩個演算法，包括線性回歸和邏輯回歸。在實際問題中，有可能會遇到過擬合問題，它會導致效果很差。這一節，我們介紹一下什麼是過擬合，以及解決過擬合問題的方法，正則化。過擬合如果我們有非常多的特徵，我們通過學習得到的假設可能能夠非常好地適應訓練集代價函式可能幾乎為 0 但是可...

吳恩達機器學習 正則化邏輯回歸習題

吳恩達機器學習（正則化）

吳恩達機器學習作業 邏輯回歸模型

吳恩達機器學習筆記（4 正則化）

相關推薦

吳恩達機器學習正則化邏輯回歸習題

吳恩達機器學習作業邏輯回歸模型