int8量化中的KL散度(相對熵)

2021-10-10 08:30:07 字數 3842 閱讀 4053

我們知道,p,q兩列資料的相對熵越小,那麼p,q分布越接近,用q近似p損失的資訊就少,英偉達的int8量化就是基於這個原理,如圖是英偉達int8量化的演算法偽**

下面是根據相對熵來選取最佳閾值的**。

import numpy as np

import copy

defcompute_kl_divergence

(p,q)

: length=

len(p)

sum=

0.0for i in

range

(length)

:if p[i]!=0

:if q[i]==0

:sum+=1

else

:sum

+=p[i]

*np.log(p[i]

/q[i]

)return

sumdef

threshold_distribution

(distribution,target_bin)

: target_threshold = target_bin

min_kl_divergence =

10000000000000

length =

len(distribution)

for threshold in

range

(target_bin,length)

:#t_distribution=np.empty((threshold,))

t_distribution=copy.deepcopy(distribution[

0:threshold]

) t_distribution[threshold -1]

+= np.

sum(distribution[threshold:])

#get p

num_per_bin = threshold / target_bin

quantize_distribution = np.zeros(

(target_bin,))

for i in

range

(target_bin)

: start = i * num_per_bin

end = start + num_per_bin

left_upper =

int(np.ceil(start)

)if left_upper > start:

left_scale = left_upper - start

quantize_distribution[i]

+= left_scale * distribution[left_upper -1]

right_lower =

int(np.floor(end)

)if right_lower < end:

right_scale = end - right_lower

quantize_distribution[i]

+= right_scale * distribution[right_lower]

for j in

range

(left_upper,right_lower)

: quantize_distribution[i]

+= distribution[j]

# get q

expand_distribution=np.zeros_like(t_distribution)

for i in

range

(target_bin)

: start = i * num_per_bin

end = start + num_per_bin

count =

0 left_upper =

int(np.ceil(start)

) left_scale =

0if left_upper > start:

left_scale = left_upper - start

if t_distribution[left_upper -1]

!=0: count += left_scale

right_lower =

int(np.floor(end)

) right_scale =

0if right_lower < end:

right_scale = end - right_lower

if t_distribution[right_lower]!=0

: count += right_scale

for j in

range

(left_upper,right_lower)

:if t_distribution[j]!=0

: count+=

1 expand_value = quantize_distribution[i]

/ count

if left_upper > start:

if t_distribution[left_upper -1]

!=0: expand_distribution[left_upper -1]

+= expand_value * left_scale

if right_lower < end:

if t_distribution[right_lower]!=0

: expand_distribution[right_lower]

+= expand_value * right_scale

for j in

range

(left_upper,right_lower)

:if t_distribution[j]!=0

: expand_distribution[j]

+= expand_value

kl_divergence = compute_kl_divergence(t_distribution, expand_distribution)

#print(threshold,kl_divergence)

if kl_divergence < min_kl_divergence:

min_kl_divergence = kl_divergence

target_threshold = threshold

return target_threshold

if __name__==

'__main__'

: distribution=np.empty(

(2048,)

)for i in

range

(len

(distribution)):

distribution[i]

=i distribution/=np.

sum(distribution)

target_threshold=threshold_distribution(distribution,

128)

print

(target_threshold)

相對熵(KL散度)

今天開始來講相對熵,我們知道資訊熵反應了乙個系統的有序化程度,乙個系統越是有序,那麼它的資訊熵就越低,反 之就越高。下面是熵的定義 如果乙個隨機變數 量 有了資訊熵的定義,接下來開始學習相對熵。contents 1.相對熵的認識 2.相對熵的性質 3.相對熵的應用 1.相對熵的認識 相對熵又稱互熵,...

相對熵(KL散度)

今天開始來講相對熵,我們知道資訊熵反應了乙個系統的有序化程度,乙個系統越是有序,那麼它的資訊熵就越低,反 之就越高。下面是熵的定義 如果乙個隨機變數 量 有了資訊熵的定義,接下來開始學習相對熵。contents 1.相對熵的認識 2.相對熵的性質 3.相對熵的應用 1.相對熵的認識 相對熵又稱互熵,...

相對熵(KL散度)

今天開始來講相對熵,我們知道資訊熵反應了乙個系統的有序化程度,乙個系統越是有序,那麼它的資訊熵就越低,反 之就越高。下面是熵的定義 如果乙個隨機變數 量 有了資訊熵的定義,接下來開始學習相對熵。contents 1.相對熵的認識 2.相對熵的性質 3.相對熵的應用 1.相對熵的認識 相對熵又稱互熵,...