我們知道,p,q兩列資料的相對熵越小,那麼p,q分布越接近,用q近似p損失的資訊就少,英偉達的int8量化就是基於這個原理,如圖是英偉達int8量化的演算法偽**
下面是根據相對熵來選取最佳閾值的**。
import numpy as np
import copy
defcompute_kl_divergence
(p,q)
: length=
len(p)
sum=
0.0for i in
range
(length)
:if p[i]!=0
:if q[i]==0
:sum+=1
else
:sum
+=p[i]
*np.log(p[i]
/q[i]
)return
sumdef
threshold_distribution
(distribution,target_bin)
: target_threshold = target_bin
min_kl_divergence =
10000000000000
length =
len(distribution)
for threshold in
range
(target_bin,length)
:#t_distribution=np.empty((threshold,))
t_distribution=copy.deepcopy(distribution[
0:threshold]
) t_distribution[threshold -1]
+= np.
sum(distribution[threshold:])
#get p
num_per_bin = threshold / target_bin
quantize_distribution = np.zeros(
(target_bin,))
for i in
range
(target_bin)
: start = i * num_per_bin
end = start + num_per_bin
left_upper =
int(np.ceil(start)
)if left_upper > start:
left_scale = left_upper - start
quantize_distribution[i]
+= left_scale * distribution[left_upper -1]
right_lower =
int(np.floor(end)
)if right_lower < end:
right_scale = end - right_lower
quantize_distribution[i]
+= right_scale * distribution[right_lower]
for j in
range
(left_upper,right_lower)
: quantize_distribution[i]
+= distribution[j]
# get q
expand_distribution=np.zeros_like(t_distribution)
for i in
range
(target_bin)
: start = i * num_per_bin
end = start + num_per_bin
count =
0 left_upper =
int(np.ceil(start)
) left_scale =
0if left_upper > start:
left_scale = left_upper - start
if t_distribution[left_upper -1]
!=0: count += left_scale
right_lower =
int(np.floor(end)
) right_scale =
0if right_lower < end:
right_scale = end - right_lower
if t_distribution[right_lower]!=0
: count += right_scale
for j in
range
(left_upper,right_lower)
:if t_distribution[j]!=0
: count+=
1 expand_value = quantize_distribution[i]
/ count
if left_upper > start:
if t_distribution[left_upper -1]
!=0: expand_distribution[left_upper -1]
+= expand_value * left_scale
if right_lower < end:
if t_distribution[right_lower]!=0
: expand_distribution[right_lower]
+= expand_value * right_scale
for j in
range
(left_upper,right_lower)
:if t_distribution[j]!=0
: expand_distribution[j]
+= expand_value
kl_divergence = compute_kl_divergence(t_distribution, expand_distribution)
#print(threshold,kl_divergence)
if kl_divergence < min_kl_divergence:
min_kl_divergence = kl_divergence
target_threshold = threshold
return target_threshold
if __name__==
'__main__'
: distribution=np.empty(
(2048,)
)for i in
range
(len
(distribution)):
distribution[i]
=i distribution/=np.
sum(distribution)
target_threshold=threshold_distribution(distribution,
128)
print
(target_threshold)
相對熵(KL散度)
今天開始來講相對熵,我們知道資訊熵反應了乙個系統的有序化程度,乙個系統越是有序,那麼它的資訊熵就越低,反 之就越高。下面是熵的定義 如果乙個隨機變數 量 有了資訊熵的定義,接下來開始學習相對熵。contents 1.相對熵的認識 2.相對熵的性質 3.相對熵的應用 1.相對熵的認識 相對熵又稱互熵,...
相對熵(KL散度)
今天開始來講相對熵,我們知道資訊熵反應了乙個系統的有序化程度,乙個系統越是有序,那麼它的資訊熵就越低,反 之就越高。下面是熵的定義 如果乙個隨機變數 量 有了資訊熵的定義,接下來開始學習相對熵。contents 1.相對熵的認識 2.相對熵的性質 3.相對熵的應用 1.相對熵的認識 相對熵又稱互熵,...
相對熵(KL散度)
今天開始來講相對熵,我們知道資訊熵反應了乙個系統的有序化程度,乙個系統越是有序,那麼它的資訊熵就越低,反 之就越高。下面是熵的定義 如果乙個隨機變數 量 有了資訊熵的定義,接下來開始學習相對熵。contents 1.相對熵的認識 2.相對熵的性質 3.相對熵的應用 1.相對熵的認識 相對熵又稱互熵,...