【语音识别】基于matlab MFCC+IPC特征+SVM中英语种识别【含Matlab源码 612期】

举报
海神之光 发表于 2022/05/29 03:49:31 2022/05/29
【摘要】 一、语种识别音频处理简介 1 基本原理 语种识别,根据一段音频判断该音频是英语、中语还是法语,即判断音频的语种。语种识别项目的整体思想就是把语音数据转换成相应的语谱图或者MFCC特征,再对特征进行分析,...

一、语种识别音频处理简介

1 基本原理
语种识别,根据一段音频判断该音频是英语、中语还是法语,即判断音频的语种。语种识别项目的整体思想就是把语音数据转换成相应的语谱图或者MFCC特征,再对特征进行分析,从而判断出该语音数据的语种类别。

2 公开数据集
Topcoder 竞赛 数据(44.1khz 的 mp3 录音,每条 10 秒,176 种语言合计 66176(176*376)条数据,诸多小语种)。

3 基本音频处理流程
语音输入,然后音频信号特征提取,然后进行特征分析处理,最终得到结果,其中音频特征提取多半采用频谱图或者MFCC特征。

4 详解
4.1 语音输入
wav(波形音频文件)mp3 文件或是麦克风中输入的音频信号输入音频

4.2 音频信号特证提取
语音信号处理的目的是弄清语音中各个频率成分的分布。常用的数学工具是傅里叶变换,而傅里叶变换要求输入信号是平稳的,需要对语音信号进行分帧处理,截取出来的一小段信号(通常 20-30ms)就叫一帧。【微观里断定输入信号是平稳的】
语音分帧→每一帧分别 FFT( 离散傅立叶变换) →求取 FFT 之后的幅度/能量,这些数值都是正值,类似图像的像素点,显示出来就是语谱图。
其中语谱图的 x 是时间,y 轴是频率。利用语谱图可以查看指定频率端的能量分布。

二、部分源代码

clc;
clear;
load traindata Myfeature
A1=zeros(1,30);
A2=ones(1,30);
Group=[A1,A2];
TrainData=Myfeature;
SVMStruct = svmtrain(TrainData,Group); 

N=5.3;
Tw = 25;           % analysis frame duration (ms)
Ts = 10;           % analysis frame shift (ms)
alpha = 0.97;      % preemphasis coefficient
R = [ 300 3700 ];  % frequency range to consider
M = 20;            % number of filterbank channels
C = 13;            % number of cepstral coefficients
L = 22;            % cepstral sine lifter parameter
fs = 16000;
hamming = @(N)(0.54-0.46*cos(2*pi*[0:N-1].'/(N-1)));


[filename, pathname] = uigetfile({'*.*';'*.flac'; '*.wav'; '*.mp3'; }, '选择语音');
% %没有图像
if filename == 0    
    return;
end
[speech,fs] = audioread([pathname, filename]);
[voice,fs]=extractvoice_simple(speech,-30, -20,0.2);
voicex=voice(1:N*16000);
[ mfccs, FBEs, frames ] = ...
    mfcc( voicex, fs, Tw, Ts, alpha, hamming, R, M, C, L );
 ceps_mfccx=mfccs(:); 
 [cep,ER]=lpces(voicex,17,256,256); ceps_lpc=cep(2:17,:);%LPC

            %[lpc,ER]=lpces(voice,12,256,256);
            %ceps_lpcc=lpc2lpcc(cep);%LPCC
            ceps_lpcx=ceps_lpc(:);
            ceps=[ceps_mfccx(1000:2000);ceps_lpcx(1:2000)];
            TestData = ceps';
languagex=svmclassify(SVMStruct,TestData);
if languagex == 1
    language='Chinese'
else
    language='English'
end
% t=[1:2000];
% figure
% scatter(t,ceps_lpcx(1:2000),50,'r');
% xlabel('sample point');
% ylabel('LPC');
% title('LPC features');
% hold on
% [filename, pathname] = uigetfile({'*.*';'*.flac'; '*.wav'; '*.mp3'; }, '选择语音');
% % %没有图像
% if filename == 0    
%     return;
% end
% [speech,fs] = audioread([pathname, filename]);
% [voice,fs]=extractvoice_simple(speech,-30, -20,0.2);
% voicex=voice(1:N*16000);
% [ mfccs, FBEs, frames ] = ...
%     mfcc( voicex, fs, Tw, Ts, alpha, hamming, R, M, C, L );
%  ceps_mfccx=mfccs(:); 
%  [cep,ER]=lpces(voicex,17,256,256); ceps_lpc=cep(2:17,:);%LPC
% 
function [ H, f, c ] = trifbank( M, K, R, fs, h2w, w2h )
% TRIFBANK Triangular filterbank.
%
%   [H,F,C]=TRIFBANK(M,K,R,FS,H2W,W2H) returns matrix of M triangular filters 
%   (one per row), each K coefficients long along with a K coefficient long 
%   frequency vector F and M+2 coefficient long cutoff frequency vector C. 
%   The triangular filters are between limits given in R (Hz) and are 
%   uniformly spaced on a warped scale defined by forward (H2W) and backward 
%   (W2H) warping functions.
%
%   Inputs
%           M is the number of filters, i.e., number of rows of H
%
%           K is the length of frequency response of each filter 
%             i.e., number of columns of H
%
%           R is a two element vector that specifies frequency limits (Hz), 
%             i.e., R = [ low_frequency high_frequency ];
%
%           FS is the sampling frequency (Hz)
%
%           H2W is a Hertz scale to warped scale function handle
%
%           W2H is a wared scale to Hertz scale function handle
%
%   Outputs
%           H is a M by K triangular filterbank matrix (one filter per row)
%
%           F is a frequency vector (Hz) of 1xK dimension
%
%           C is a vector of filter cutoff frequencies (Hz), 
%             note that C(2:end) also represents filter center frequencies,
%             and the dimension of C is 1x(M+2)
%
%   Example
%           fs = 16000;               % sampling frequency (Hz)
%           nfft = 2^12;              % fft size (number of frequency bins)
%           K = nfft/2+1;             % length of each filter
%           M = 23;                   % number of filters
%
%           hz2mel = @(hz)(1127*log(1+hz/700)); % Hertz to mel warping function
%           mel2hz = @(mel)(700*exp(mel/1127)-700); % mel to Hertz warping function
%
%           % Design mel filterbank of M filters each K coefficients long,
%           % filters are uniformly spaced on the mel scale between 0 and Fs/2 Hz
%           [ H1, freq ] = trifbank( M, K, [0 fs/2], fs, hz2mel, mel2hz );
%
%           % Design mel filterbank of M filters each K coefficients long,
%           % filters are uniformly spaced on the mel scale between 300 and 3750 Hz
%           [ H2, freq ] = trifbank( M, K, [300 3750], fs, hz2mel, mel2hz );
%
%           % Design mel filterbank of 18 filters each K coefficients long, 
%           % filters are uniformly spaced on the Hertz scale between 4 and 6 kHz
%           [ H3, freq ] = trifbank( 18, K, [4 6]*1E3, fs, @(h)(h), @(h)(h) );
%
%            hfig = figure('Position', [25 100 800 600], 'PaperPositionMode', ...
%                              'auto', 'Visible', 'on', 'color', 'w'); hold on; 
%           subplot( 3,1,1 ); 
%           plot( freq, H1 );
%           xlabel( 'Frequency (Hz)' ); ylabel( 'Weight' ); set( gca, 'box', 'off' ); 
%       
%           subplot( 3,1,2 );
%           plot( freq, H2 );
%           xlabel( 'Frequency (Hz)' ); ylabel( 'Weight' ); set( gca, 'box', 'off' ); 
%       
%           subplot( 3,1,3 ); 
%           plot( freq, H3 );
%           xlabel( 'Frequency (Hz)' ); ylabel( 'Weight' ); set( gca, 'box', 'off' ); 
%
%   Reference
%           [1] Huang, X., Acero, A., Hon, H., 2001. Spoken Language Processing: 
%               A guide to theory, algorithm, and system development. 
%               Prentice Hall, Upper Saddle River, NJ, USA (pp. 314-315).

%   Author  Kamil Wojcicki, UTD, June 2011


    if( nargin~= 6 ), help trifbank; return; end; % very lite input validation

    f_min = 0;          % filter coefficients start at this frequency (Hz)
    f_low = R(1);       % lower cutoff frequency (Hz) for the filterbank 
    f_high = R(2);      % upper cutoff frequency (Hz) for the filterbank 
    f_max = 0.5*fs;     % filter coefficients end at this frequency (Hz)
    f = linspace( f_min, f_max, K ); % frequency range (Hz), size 1xK
    fw = h2w( f );

    % filter cutoff frequencies (Hz) for all filters, size 1x(M+2)
    c = w2h( h2w(f_low)+[0:M+1]*((h2w(f_high)-h2w(f_low))/(M+1)) );
    cw = h2w( c );

    H = zeros( M, K );                  % zero otherwise
    for m = 1:M 

        % implements Eq. (6.140) on page 314 of [1] 
        % k = f>=c(m)&f<=c(m+1); % up-slope
        % H(m,k) = 2*(f(k)-c(m)) / ((c(m+2)-c(m))*(c(m+1)-c(m)));
        % k = f>=c(m+1)&f<=c(m+2); % down-slope
        % H(m,k) = 2*(c(m+2)-f(k)) / ((c(m+2)-c(m))*(c(m+2)-c(m+1)));

        % implements Eq. (6.141) on page 315 of [1]
        k = f>=c(m)&f<=c(m+1); % up-slope
        H(m,k) = (f(k)-c(m))/(c(m+1)-c(m));
        k = f>=c(m+1)&f<=c(m+2); % down-slope
        H(m,k) = (c(m+2)-f(k))/(c(m+2)-c(m+1));
       
   end

三、运行结果

在这里插入图片描述
在这里插入图片描述

四、matlab版本及参考文献

1 matlab版本
2014a

2 参考文献
[1]韩纪庆,张磊,郑铁然.语音信号处理(第3版)[M].清华大学出版社,2019.
[2]柳若边.深度学习:语音识别技术实践[M].清华大学出版社,2019.

文章来源: qq912100926.blog.csdn.net,作者:海神之光,版权归原作者所有,如需转载,请联系作者。

原文链接:qq912100926.blog.csdn.net/article/details/115139610

【版权声明】本文为华为云社区用户转载文章,如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱: cloudbbs@huaweicloud.com
  • 点赞
  • 收藏
  • 关注作者

评论(0

0/1000
抱歉,系统识别当前为高风险访问,暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称,即可参与社区互动!

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。