梯度下降隨機梯度下降和批量梯度下降

對比梯度下降和隨機梯度下降和批量梯度下降

之前看的知識比較零散，沒有乙個系統的解釋說明，看了一些網上的博主的分析，總結了一下自己的理解。

例子這裡我參照其他博主的例子做了一些修改，首先是梯度下降

#-*- coding: utf-8 -*-
import random
#this is a sample to simulate a function y = theta1*x1 + theta2*x2
input_x = [[1,4], [2,5], [5,1], [4,2]]  
y = [19,26,19,20]  
theta = [1,1]
loss = 10
step_size = 0.001
eps =0.0001
max_iters = 10000
error =0
iter_count = 0
while( loss > eps and iter_count < max_iters):
loss = 0
#這裡更新權重的時候所有的樣本點都用上了
for i in range (3):
pred_y = theta[0]*input_x[i][0]+theta[1]*input_x[i][1]
theta[0] = theta[0] - step_size * (pred_y - y[i]) * input_x[i][0]
theta[1] = theta[1] - step_size * (pred_y - y[i]) * input_x[i][1]
for i in range (3):
pred_y = theta[0]*input_x[i][0]+theta[1]*input_x[i][1]
error = 0.5*(pred_y - y[i])**2
loss = loss + error
iter_count += 1
print 'iters_count', iter_count
print 'theta: ',theta 
print 'final loss: ', loss
print 'iters: ', iter_count

iters_count 219

iters_count 220

iters_count 221

iters_count 222

iters_count 223

iters_count 224

iters_count 225

theta: [3.0027765778748003, 3.997918297015663]

final loss: 9.68238055213e-05

iters: 225

[finished in 0.2s]

隨機梯度下降

每次選取乙個隨機值，隨機乙個點更新θθ

#-*- coding: utf-8 -*-
import random
#this is a sample to simulate a function y = theta1*x1 + theta2*x2
input_x = [[1,4], [2,5], [5,1], [4,2]]  
y = [19,26,19,20]  
theta = [1,1]
loss = 10
step_size = 0.001
eps =0.0001
max_iters = 10000
error =0
iter_count = 0
while( loss > eps and iter_count < max_iters):
loss = 0
#每一次選取隨機的乙個點進行權重的更新
i = random.randint(0,3)
pred_y = theta[0]*input_x[i][0]+theta[1]*input_x[i][1]
theta[0] = theta[0] - step_size * (pred_y - y[i]) * input_x[i][0]
theta[1] = theta[1] - step_size * (pred_y - y[i]) * input_x[i][1]
for i in range (3):
pred_y = theta[0]*input_x[i][0]+theta[1]*input_x[i][1]
error = 0.5*(pred_y - y[i])**2
loss = loss + error
iter_count += 1
print 'iters_count', iter_count
print 'theta: ',theta 
print 'final loss: ', loss
print 'iters: ', iter_count

其結果的輸出是

iters_count 1226

iters_count 1227

iters_count 1228

iters_count 1229

iters_count 1230

iters_count 1231

iters_count 1232

theta: [3.002441488688225, 3.9975844154600226]

final loss: 9.989420302e-05

iters: 1232

[finished in 0.3s]

批量隨機梯度下降

這裡用了2個樣本點

#-*- coding: utf-8 -*-
import random
#this is a sample to simulate a function y = theta1*x1 + theta2*x2
input_x = [[1,4], [2,5], [5,1], [4,2]]  
y = [19,26,19,20]  
theta = [1,1]
loss = 10
step_size = 0.001
eps =0.0001
max_iters = 10000
error =0
iter_count = 0
while( loss > eps and iter_count < max_iters):
loss = 0
i = random.randint(0,3) #注意這裡，我這裡批量每次選取的是2個樣本點做更新，另乙個點是隨機點+1的相鄰點
j = (i+1)%4
pred_y = theta[0]*input_x[i][0]+theta[1]*input_x[i][1]
theta[0] = theta[0] - step_size * (pred_y - y[i]) * input_x[i][0]
theta[1] = theta[1] - step_size * (pred_y - y[i]) * input_x[i][1]
pred_y = theta[0]*input_x[j][0]+theta[1]*input_x[j][1]
theta[0] = theta[0] - step_size * (pred_y - y[j]) * input_x[j][0]
theta[1] = theta[1] - step_size * (pred_y - y[j]) * input_x[j][1]
for i in range (3):
pred_y = theta[0]*input_x[i][0]+theta[1]*input_x[i][1]
error = 0.5*(pred_y - y[i])**2
loss = loss + error
iter_count += 1
print 'iters_count', iter_count
print 'theta: ',theta 
print 'final loss: ', loss
print 'iters: ', iter_count

其最後的輸出結果是

.....

iters_count 543

iters_count 544

iters_count 545

iters_count 546

iters_count 547

iters_count 548

iters_count 549

theta: [3.0023012574840764, 3.997553282857357]

final loss: 9.81717138358e-05

iters: 549

對比一下結果，每個例子我都跑了幾次，基本上都維持在哪個迭代次數，可以看到梯度下降迭代的次數最少，因為我這裡樣本點少，所以這樣快。資料多了的話，你想動則幾萬的樣本計算一次的時間就夠嗆。隨機梯度的話因為每次都用乙個樣本，所以收斂的速度就會慢一些。批量的話這裡用了2個樣本點，因而速度基本上隨機是1200度次迭代，批量大概是550。

其實這些概念一開始沒搞明白，在caffe中，跑網路，裡面讓你選的這個batch其實就是這麼回事。你設乙個比較恰當的batch值是可以幫助網路加速收斂的。

批量梯度下降，隨機梯度下降，小批量梯度下降

在機器學習領域中，梯度下降的方式有三種，分別是批量梯度下降法bgd 隨機梯度下降法sgd 小批量梯度下降法mbgd，並且都有不同的優缺點。下面我們以線性回歸演算法也可以是別的演算法，只是損失函式目標函式不同而已，它們的導數的不同，做法是一模一樣的為例子來對三種梯度下降法進行比較。假設特徵...

梯度下降隨機梯度下降批梯度下降

下面的h x 是要擬合的函式，j 損失函式，theta是引數，要迭代求解的值，theta求解出來了那最終要擬合的函式h 就出來了。其中m是訓練集的記錄條數，j是引數的個數。梯度下降法流程 1 先對隨機賦值，可以是乙個全零的向量。2 改變的值，使j 按梯度下降的方向減少。以上式為例 1 對於我們的...

stanford 梯度梯度下降，隨機梯度下降

一梯度gradient 在標量場f中的一點處存在乙個向量g，該向量方向為f在該點處變化率最大的方向，其模也等於這個最大變化率的數值，則向量g稱為標量場f的梯度。在向量微積分中，標量場的梯度是乙個向量場。標量場中某一點上的梯度指向標量場增長最快的方向，梯度的長度是這個最大的變化率。更嚴格的說，從歐氏...

梯度下降 隨機梯度下降和批量梯度下降

批量梯度下降，隨機梯度下降，小批量梯度下降

梯度下降 隨機梯度下降 批梯度下降

stanford 梯度 梯度下降，隨機梯度下降

相關推薦

梯度下降隨機梯度下降和批量梯度下降

梯度下降隨機梯度下降批梯度下降

stanford 梯度梯度下降，隨機梯度下降