DM 標準化與歸一化正則化

資料的標準化（normalization）是將資料按比例縮放，使之落入乙個小的特定區間。

其中最典型的就是資料的歸一化處理，即將資料統一對映到[0,1]區間上。

詳情可見：為什麼要做特徵歸一化/標準化？

資料預處理

把有量綱形式變為無量綱形式

import numpy as np
...: from sklearn.preprocessing import minmaxscaler
...: a=np.array([[1,2,3],
...:             [2,0,0],
...:             [-1,0,1]])
...: scaler=minmaxscaler()
...: a_scaler=scaler.fit_transform(a)
...: print(a_scaler)
[[0.667 1.    1.   ]
[1.    0.    0.   ]
[0.    0.    0.333]]

反餘切函式歸一化

z分數(z-score)規範化（零均值規範化）

###方法一
import numpy as np
from sklearn.preprocessing import scale
...: a=np.array([[
1,2,
3],.
..:[
2,0,
0],.
..:[
-1,0
,1]]
)...
: a_scaled=scale(a)..
.:print
(a_scaled)..
.: b=a_scaled.mean(axis=0)
...: np.set_printoptions(precision=
3, suppress=
true).
..:print
(a_scaled.mean(axis=0)
)...
:print
(a_scaled.std(axis=0)
)[[0.26726124
1.41421356
1.33630621][
1.06904497
-0.70710678
-1.06904497][
-1.33630621
-0.70710678
-0.26726124]]
[0.0
.0.]
[1.1
.1.]
### 方法二
import numpy as np 
...:from sklearn.preprocessing  import standardscaler
...: a=np.array([[
1,2,
3],.
..:[
2,0,
0],.
..:[
-1,0
,1]]
)...
: scaler=standardscaler(
).fit(a)..
.:print
(scaler)..
.:print
(scaler.mean_)..
.:print
(scaler.transform(a)
)standardscaler(copy=
true
, with_mean=
true
, with_std=
true)[
0.667
0.667
1.333][
[0.267
1.414
1.336][
1.069
-0.707
-1.069][
-1.336
-0.707
-0.267
]]

小數定標規範化

模糊量化規範化

正則化的過程是將每個樣本縮放到單位範數（每個樣本的範數為1），如果後面要使用如二次型（點積）或者其它核方法計算兩個樣本之間的相似性這個方法會很有用。

normalization主要思想是對每個樣本計算其p-範數，然後對該樣本中每個元素除以該範數，這樣處理的結果是使得每個處理後樣本的p-範數（l1-norm,l2-norm）等於1。

詳情可見：線性回歸與正則化 ∣∣x

∣∣p=

(∣x1

∣p+∣

x2∣p

+...

+∣xn

∣p)1

/p||x||_p=(|x_1|^p+|x_2|^p+...+|x_n|^p)^

∣∣x∣∣p

=(∣

x1∣

p+∣x

2∣p

+...

+∣xn

∣p)

1/p

l0-範數：∥x⃗

∥0=(

i),i

≠0∥x⃗ ∥_=(i), i≠0

∥x⃗∥0

=(i)

,i

=0；l1-範數：∥x⃗

∥1=∑

i=1d

∣xi∣

∥x⃗ ∥_=∑^_|x_i|

∥x⃗∥1

=∑i=

1d∣

xi∣

；l2-範數：∥x⃗

∥2=(

∑i=1

dxi2

)1/2

∥x⃗ ∥_=(∑^_x_^)^

∥x⃗∥2

=(∑i

=1d

xi2

)1/2

；lp-範數：∥x⃗

∥p=(

∑i=1

dxip

)1/p

∥x⃗ ∥_=(∑^_x^_)^

∥x⃗∥p

=(∑i

=1d

xip

)1/p

；l∞-範數：∥x⃗

∥∞=l

imp→

+∞(∑

i=1d

xip)

1/p∥x⃗ ∥∞=lim_(∑^_x^_)^

∥x⃗∥∞=

limp

→+∞

(∑i=

1dx

ip)

1/p。

import numpy as np
from sklearn.preprocessing import normalize
x = [[ 1., -1.,  2.],
[ 2.,  0.,  0.],
[ 0.,  1., -1.]]
x_normalized = normalize(x, norm = 'l2')#l2範數
print x_normalized
#[[ 0.40824829 -0.40824829  0.81649658]
# [ 1.          0.          0.        ]
# [ 0.          0.70710678 -0.70710678]]

wiki-feature scaling

scikit learn-importance of feature scaling

為什麼要做特徵歸一化/標準化？

normalization and standardization

歸一化，標準化與正則化

歸一化 resaling 一般是將資料對映到指定的範圍，用於去除不同維度放入量綱以及量綱單位。常見的對映範圍有 0,1 和 1,1 最常見的歸一化方法就是min max 歸一化最常見的標準化方法 z score 標準化。其中是樣本均值，是樣本資料的標準差。上圖則是乙個散點序列的標準化過程原圖 ...

歸一化標準化正則化

無量綱化使不同規格的資料轉換到同一規格。常用的無量綱化方法有標準化和區間縮放法。標準化的前提是特徵值服從正態分佈，標準化後，其轉換成標準正態分佈區間縮放法利用了邊界值資訊，將特徵的取值區間縮放到某個特點的範圍，例如 0,1 等。標準化的前提是特徵值服從正態分佈，標準化後，其轉換成標準正態分佈 z ...

標準化歸一化正則化

x x x min x max x min 歸一化後的資料取值區間為 0,1 from sklearn.preprocessing import minmaxscaler import numpy as np data np.random.uniform 0,100,10 np.newaxis 隨機...

DM 標準化與歸一化 正則化

歸一化，標準化與正則化

歸一化 標準化 正則化

標準化 歸一化 正則化

相關推薦

DM 標準化與歸一化正則化

歸一化標準化正則化

標準化歸一化正則化