改變損失函式和學習率來觀察收斂性

改變損失函式和學習率來觀察收斂性的變化。

# linear regression: l1 vs l2
# 改變損失函式和學習率來觀察收斂性的變化
#----------------------------------
## this function shows how to use tensorflow to
# solve linear regression via the matrix inverse.
import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
from sklearn import datasets
from tensorflow.python.framework import ops
ops.reset_default_graph()
# create graph
sess = tf.session()
# load the data
# iris.data = [(sepal length, sepal width, petal length, petal width)]
iris = datasets.load_iris()
x_vals = np.array([x[3] for x in iris.data])
y_vals = np.array([y[0] for y in iris.data])
# declare batch size and number of iterations
batch_size = 25
learning_rate = 0.05  # 學習率0.4將不會收斂
iterations = 50
# initialize placeholders
x_data = tf.placeholder(shape=[none, 1], dtype=tf.float32)
y_target = tf.placeholder(shape=[none, 1], dtype=tf.float32)
# create variables for linear regression
a = tf.variable(tf.random_normal(shape=[1,1]))
b = tf.variable(tf.random_normal(shape=[1,1]))
# declare model operations
model_output = tf.add(tf.matmul(x_data, a), b)
# 損失函式改為l1正則損失函式
loss_l1 = tf.reduce_mean(tf.abs(y_target - model_output))
# declare optimizers
my_opt_l1 = tf.train.gradientdescentoptimizer(learning_rate)
train_step_l1 = my_opt_l1.minimize(loss_l1)
# initialize variables
init = tf.global_variables_initializer()
sess.run(init)
# training loop
loss_vec_l1 = 
for i in range(iterations):
rand_index = np.random.choice(len(x_vals), size=batch_size)
rand_x = np.transpose([x_vals[rand_index]])
rand_y = np.transpose([y_vals[rand_index]])
sess.run(train_step_l1, feed_dict=)
temp_loss_l1 = sess.run(loss_l1, feed_dict=)
if (i+1)%25==0:
print('step #' + str(i+1) + ' a = ' + str(sess.run(a)) + ' b = ' + str(sess.run(b)))
# l2 loss
# reinitialize graph
ops.reset_default_graph()
# create graph
sess = tf.session()
# initialize placeholders
x_data = tf.placeholder(shape=[none, 1], dtype=tf.float32)
y_target = tf.placeholder(shape=[none, 1], dtype=tf.float32)
# create variables for linear regression
a = tf.variable(tf.random_normal(shape=[1,1]))
b = tf.variable(tf.random_normal(shape=[1,1]))
# declare model operations
model_output = tf.add(tf.matmul(x_data, a), b)
# 損失函式改為l2正則損失函式
loss_l2 = tf.reduce_mean(tf.square(y_target - model_output))
# declare optimizers
my_opt_l2 = tf.train.gradientdescentoptimizer(learning_rate)
train_step_l2 = my_opt_l2.minimize(loss_l2)
# initialize variables
init = tf.global_variables_initializer()
sess.run(init)
loss_vec_l2 = 
for i in range(iterations):
rand_index = np.random.choice(len(x_vals), size=batch_size)
rand_x = np.transpose([x_vals[rand_index]])
rand_y = np.transpose([y_vals[rand_index]])
sess.run(train_step_l2, feed_dict=)
temp_loss_l2 = sess.run(loss_l2, feed_dict=)
if (i+1)%25==0:
print('step #' + str(i+1) + ' a = ' + str(sess.run(a)) + ' b = ' + str(sess.run(b)))
# plot loss over time
plt.plot(loss_vec_l1, 'k-', label='l1 loss')
plt.plot(loss_vec_l2, 'r--', label='l2 loss')
plt.title('l1 and l2 loss per generation')
plt.xlabel('generation')
plt.ylabel('l1 loss')
plt.legend(loc='upper right')
plt.show()

如果學習率太小，演算法收斂耗時將更長。但是如果學習率太大，演算法有可能產生不收斂的問題。下面繪製iris資料的線性回歸問題的l1正則和l2正則損失（見下圖），其中學習率為0.05。

iris資料線性回歸的l1正則和l2正則損失，學習率為0.05。

從上圖中可以看出，當學習率為0.05時，l2正則損失更優，其有更低的損失值。當增加學習率為0.4時，繪製其損失函式（見下圖）。

iris資料線性回歸的l1正則和l2正則損失，學習率為0.4。其中l1正則損失不可見是因為它的y軸值太大。學習率大導致l2損失過大，而l1正則損失收斂。

學習率0.1

這裡清晰地展示大學習率和小學習率對l1正則和l2正則損失函式的影響。這裡視覺化的是l1正則和l2正則損失函式的一維情況，如下圖。

複雜度學習率損失函式

神經網路的複雜度用網路層數和神經網路引數的個數來表示空間複雜度 import tensorflow as tf x tf.random.normal 20 2 mean 2,stddev 1,dtype tf.float32 y item1 2 item2 for item1,item2 in ...

和學習率機器學習中的成本函式，學習率和梯度下降

我們在機器學習中最主要的目標是最小化成本函式，因此，將執行優化過程以最小化該成本函式。成本函式由下式給出為了深入了解成本函式的幾何形狀，讓我們學習凹函式和凸函式凹函式在凹函式g x 中，對於x軸上的任意兩個值，即a和b，點g a 和g b 之間的直線總是位於g x 的下方。凹函式的最大值是乙個...

深度學習筆記三啟用函式和損失函式

深度學習筆記一 logistic分類深度學習筆記二簡單神經網路，後向傳播演算法及實現深度學習筆記三啟用函式和損失函式深度學習筆記優化方法總結 bgd,sgd,momentum,adagrad,rmsprop,adam 深度學習筆記四迴圈神經網路的概念，結構和注釋深度學習筆記...

改變損失函式和學習率來觀察收斂性

複雜度 學習率 損失函式

和學習率 機器學習中的成本函式，學習率和梯度下降

深度學習筆記 三 啟用函式和損失函式

相關推薦

複雜度學習率損失函式

和學習率機器學習中的成本函式，學習率和梯度下降

深度學習筆記三啟用函式和損失函式