語音發生檢測VAD

webrtc 的各個音訊處理都很值得大家學習，

不說個人感覺最牛的aec，就這個vad就很好！

基本實現思想是通過把訊號分為 6個頻帶，對各個子頻帶進行雜訊和語音的高斯模型特徵判決!

對不同的訊號頻率均降頻到8k hz，內部對 16、24、32、48、做了分頻

如果需要做不同訊號頻率的檢測，需要單獨做分頻到8k。

判決引數均可調整：

個人新增了乙個具有明顯辨識度的語音頻號引數：

custom as 4

// mode 0, quality.
static const int16_t koverhangmax1q[3] = ;
static const int16_t koverhangmax2q[3] = ;
static const int16_t klocalthresholdq[3] = ;
static const int16_t kglobalthresholdq[3] = ;
// mode 1, low bitrate.
static const int16_t koverhangmax1lbr[3] = ;
static const int16_t koverhangmax2lbr[3] = ;
static const int16_t klocalthresholdlbr[3] = ;
static const int16_t kglobalthresholdlbr[3] = ;
// mode 2, aggressive.
static const int16_t koverhangmax1agg[3] = ;
static const int16_t koverhangmax2agg[3] = ;
static const int16_t klocalthresholdagg[3] = ;
static const int16_t kglobalthresholdagg[3] = ;
// mode 3, very aggressive.
static const int16_t koverhangmax1vag[3] = ;
static const int16_t koverhangmax2vag[3] = ;
static const int16_t klocalthresholdvag[3] = ;
static const int16_t kglobalthresholdvag[3] = ;
// mode 4, custom.
static const int16_t koverhangmax1cus[3] = ;
static const int16_t koverhangmax2cus[3] = ;
static const int16_t klocalthresholdcus[3] = ;
static const int16_t kglobalthresholdcus[3] = ;

單獨抽稀的vad模組原始碼：

不要用speex做靜音檢測vad

speex從1.2版本開始支援靜音檢測vad 還有降噪回聲消除自動增益控制agc 抖動buffer 重取樣等一堆功能等針對語音的預處理功能，實現在libspeexdsp庫中。真正用起來後，發現各種坑！因為speex初始化時frame size填的20ms幀長，所以各位從上圖可以看到，每隔20m...

常用有話幀檢測技術（VAD）

前言總結一下基本的有話幀檢測 voice activity detection,vad 技術，基於神經網路的待後面梳理完神經網路的理論後再作整理。一雙門限這是一種boosting的思路，即兩個弱分類器可以組合更強的分類器，依次類推，三四門限其實都可。每一種門限對應一種判決準則。基本的雙門限...

語音發生檢測VAD

不要用speex做靜音檢測vad

常用有話幀檢測技術（VAD）

常用有話幀檢測技術（VAD）

相關推薦