- 微信
- 微博
  
  分享文章到微博
- 复制链接
  
  复制链接到剪贴板

【语音识别】基于matlab语音分帧+端点检测+pitch提取+DTW算法歌曲识别【含Matlab源码 1057期】

海神之光发表于 2022/05/29 03:12:22 2022/05/29

2.3k+ 0 0

【摘要】一、DTW简介 Dynamic Time Warping（DTW）诞生有一定的历史了（日本学者Itakura提出），它出现的目的也比较单纯，是一种衡量两个长度不同的时间序列的相似度的方法。应用也比较广，...

一、DTW简介

Dynamic Time Warping（DTW）诞生有一定的历史了（日本学者Itakura提出），它出现的目的也比较单纯，是一种衡量两个长度不同的时间序列的相似度的方法。应用也比较广，主要是在模板匹配中，比如说用在孤立词语音识别（识别两段语音是否表示同一个单词），手势识别，数据挖掘和信息检索等中。

1 概述
在大部分的学科中，时间序列是数据的一种常见表示形式。对于时间序列处理来说，一个普遍的任务就是比较两个序列的相似性。
在时间序列中，需要比较相似性的两段时间序列的长度可能并不相等，在语音识别领域表现为不同人的语速不同。因为语音信号具有相当大的随机性，即使同一个人在不同时刻发同一个音，也不可能具有完全的时间长度。而且同一个单词内的不同音素的发音速度也不同，比如有的人会把“A”这个音拖得很长，或者把“i”发的很短。在这些复杂情况下，使用传统的欧几里得距离无法有效地求的两个时间序列之间的距离（或者相似性）。

2 DTW方法原理
在时间序列中，需要比较相似性的两段时间序列的长度可能并不相等，在语音识别领域表现为不同人的语速不同。而且同一个单词内的不同音素的发音速度也不同，比如有的人会把“A”这个音拖得很长，或者把“i”发的很短。另外，不同时间序列可能仅仅存在时间轴上的位移，亦即在还原位移的情况下，两个时间序列是一致的。在这些复杂情况下，使用传统的欧几里得距离无法有效地求的两个时间序列之间的距离（或者相似性）。
DTW通过把时间序列进行延伸和缩短，来计算两个时间序列性之间的相似性：

如上图所示，上下两条实线代表两个时间序列，时间序列之间的虚线代表两个时间序列之间的相似的点。DTW使用所有这些相似点之间的距离的和，称之为归整路径距离(Warp Path Distance)来衡量两个时间序列之间的相似性。

3 DTW计算方法
令要计算相似度的两个时间序列为X和Y，长度分别为|X|和|Y|。
归整路径(Warp Path)
归整路径的形式为W=w1,w2,…,wK，其中Max(|X|,|Y|)<=K<=|X|+|Y|。
wk的形式为(i,j)，其中i表示的是X中的i坐标，j表示的是Y中的j坐标。
归整路径W必须从w1=(1,1)开始，到wK=(|X|,|Y|)结尾，以保证X和Y中的每个坐标都在W中出现。
另外，W中w(i,j)的i和j必须是单调增加的，以保证图1中的虚线不会相交，所谓单调增加是指：

上图为代价矩阵(Cost Matrix) D，D(i,j)表示长度为i和j的两个时间序列之间的归整路径距离。

二、部分源代码

clc;
clear;
close all;
waveFile = sprintf('同桌的你.wav');%   同桌的你  女儿情   回梦游仙    滴答    彩虹
% 读取波形---端点检测---切音框
    waveFile='同桌的你.wav';
    pivFile = sprintf('同桌的你.piv');
    pivFile=['mfcc' pivFile];
    [y,fs]=audioread(waveFile);  %读取原文件
    figure
    subplot(221)
    plot(y);
    title('原图形');
    
    frame = PointDetect(waveFile);  %端点检测
    subplot(222)
    plot(frame);
    title('端点检测');
    
    subplot(223)
    pitch=wave2pitch(frame,fs);   %计算音高
    plot(pitch);
    title('音高');
function [pitch, pdf, frameEstimated, excitation]=frame2pitch(frame, opt, showPlot)
% frame2acf: PDF (periodicity detection function) of a given frame (primarily for pitch tracking)
%
%	Usage:
%		out=frame2pdf(frame, opt, showPlot);
%			frame: Given frame
%			opt: Options for PDF computation
%				opt.pdf: PDF function to be used
%					'acf' for ACF
%					'amdf' for AMDF
%					'nsdf' for NSDF
%					'acfOverAmdf' for ACF divided by AMDF
%					'hps' for harmonics product sum
%					'ceps' for cepstrum
%				opt.maxShift: no. of shift operations, which is equal to the length of the output vector
%				opt.method: 1 for using the whole frame for shifting
%					2 for using the whole frame for shifting, but normalize the sum by it's overlap area
%					3 for using frame(1:frameSize-maxShift) for shifting
%				opt.siftOrder: order of SIFT (0 for not using SIFT)
%			showPlot: 0 for no plot, 1 for plotting the frame and ACF output
%			out: the returned PDF vector
%
%	Example:
%		waveFile='soo.wav';
%		au=myAudioRead(waveFile);
%		frameSize=256;
%		frameMat=enframe(au.signal, frameSize);
%		frame=frameMat(:, 292);
%		opt=ptOptSet(au.fs, au.nbits, 1);
%		opt.alpha=0;
%		pitch=frame2pitch(frame, opt, 1);
%
%	See also frame2acf, frame2amdf, frame2nsdf.

%	Roger Jang 20020404, 20041013, 20060313

if nargin<1, selfdemo; return; end
if nargin<2||isempty(opt), opt=ptOptSet(8000, 16, 1); end
if nargin<3, showPlot=0; end

%% ====== Preprocessing
%save frame frame
frame=frameZeroMean(frame, opt.zeroMeanPolyOrder);
%frame=frameZeroMean(frame, 0);

frameEstimated=[];
excitation=[];
if opt.siftOrder>0
	[frameEstimated, excitation, coef]=sift(frame, opt.siftOrder);	% Simple inverse filtering tracking
	frame=excitation;
end
frameSize=length(frame);
maxShift=min(frameSize, opt.maxShift);

switch lower(opt.pdf)
	case 'acf'
	%	pdf=frame2acf(frame, maxShift, opt.method);
		pdf=frame2acfMex(frame, maxShift, opt.method);
	%	if opt.method==1
	%		pdfWeight=1+linspace(0, opt.alpha, length(pdf))';
	%		pdf=pdf.*pdfWeight;	% To avoid double pitch error (esp for violin). 20110416
	%	end
	%	if opt.method==2
	%		pdfWeight=1-linspace(0, opt.alpha, length(pdf))';	% alpha is less than 1.
	%		pdf=pdf.*pdfWeight;	% To avoid double pitch error (esp for violin). 20110416
	%	end
		pdfLen=length(pdf);
		pdfWeight=opt.alpha+pdfLen*(1-opt.alpha)./(pdfLen-(0:pdfLen-1)');
		pdf=pdf.*pdfWeight;	% alpha=0==>normalized ACF, alpha=1==>tapering ACF
	case 'amdf'
	%	amdf=frame2amdf(frame, maxShift, opt.method);
		amdf=frame2amdfMex(frame, maxShift, opt.method);
		pdf=max(amdf)*(1-linspace(0,1,length(amdf))')-amdf;
	case 'nsdf'
	%	pdf=frame2nsdf(frame, maxShift, opt.method);
		pdf=frame2nsdfMex(frame, maxShift, opt.method);
	case 'acfoveramdf'
		opt.pdf='acf';
		[acfPitch, acf] =feval(mfilename, frame, opt);
		opt.pdf='amdf';
		[amdfPitch, amdf]=feval(mfilename, frame, opt);
		pdf=0*acf;
		pdf(2:end)=acf(2:end)./amdf(2:end);
	case 'hps'
		[pdf, freq]=frame2hps(frame, opt.fs, opt.zeroPaddedFactor);
	case 'ceps'
		pdf=frame2ceps(frame, opt.fs, opt.zeroPaddedFactor);
	otherwise
		error('Unknown PDF=%s!', opt.pdf);
end

switch lower(opt.pdf)
	case {'acf', 'amdf', 'nsdf', 'amdf4pt', 'acfoveramdf', 'ceps'}
		n1=floor(opt.fs/opt.freqRange(2));		% pdf(1:n1) will not be used
		n2= ceil(opt.fs/opt.freqRange(1));		% pdf(n2:end) will not be used
		if n2>length(pdf), n2=length(pdf); end
		% Update n1 such that pdf(n1)<=pdf(n1+1)
		while n1<n2 & pdf(n1)>pdf(n1+1), n1=n1+1; end
		% Update n2 such that pdf(n2)<=pdf(n2-1)
		while n2>n1 & pdf(n2)>pdf(n2-1), n2=n2-1; end
		pdf2=pdf;
		pdf2(1:n1)=-inf;
		pdf2(n2:end)=-inf;
		[maxValue, maxIndex]=max(pdf2);
		if isinf(maxValue) || maxIndex==n1+1 || maxIndex==n2-1
			pitch=0; maxIndex=nan; maxValue=nan;
		elseif opt.useParabolicFit
			deviation=optimViaParabolicFit(pdf(maxIndex-1:maxIndex+1));
			maxIndex=maxIndex+deviation;
			pitch=freq2pitch(opt.fs/(maxIndex-1));
		else
			pitch=freq2pitch(opt.fs/(maxIndex-1));
		end
	case {'hps'}
		pdf2=pdf;
		pdf2(freq<opt.freqRange(1)|freq>opt.freqRange(2))=-inf;
		[maxValue, maxIndex]=max(pdf2);
	%	if opt.useParabolicFit
	%		deviation=optimViaParabolicFit(pdf(maxIndex-1:maxIndex+1));
	%		maxIndex=maxIndex+deviation;
	%	end
		pitch=freq2pitch(freq(maxIndex));
	otherwise
		error('Unknown PDF=%s!', opt.pdf);
end

if showPlot
	subplot(2,1,1);
	plot(frame, '.-');
	set(gca, 'xlim', [-inf inf]);
	title('Input frame');
	subplot(2,1,2);
	plot(1:length(pdf), pdf, '.-', 1:length(pdf2), pdf2, '.r');
	line(maxIndex, maxValue, 'marker', '^', 'color', 'k');
	set(gca, 'xlim', [-inf inf]);
	title(sprintf('%s vector (opt.method = %d)', opt.pdf, opt.method));
end

% ====== Self demo
function selfdemo
mObj=mFileParse(which(mfilename));
strEval(mObj.example);
  
 
  1
  2
  3
  4
  5
  6
  7
  8
  9
  10
  11
  12
  13
  14
  15
  16
  17
  18
  19
  20
  21
  22
  23
  24
  25
  26
  27
  28
  29
  30
  31
  32
  33
  34
  35
  36
  37
  38
  39
  40
  41
  42
  43
  44
  45
  46
  47
  48
  49
  50
  51
  52
  53
  54
  55
  56
  57
  58
  59
  60
  61
  62
  63
  64
  65
  66
  67
  68
  69
  70
  71
  72
  73
  74
  75
  76
  77
  78
  79
  80
  81
  82
  83
  84
  85
  86
  87
  88
  89
  90
  91
  92
  93
  94
  95
  96
  97
  98
  99
  100
  101
  102
  103
  104
  105
  106
  107
  108
  109
  110
  111
  112
  113
  114
  115
  116
  117
  118
  119
  120
  121
  122
  123
  124
  125
  126
  127
  128
  129
  130
  131
  132
  133
  134
  135
  136
  137
  138
  139
  140
  141
  142
  143
  144
  145
  146
  147
  148
  149
  150
  151
  152
  153
  154
  155
  156
  157
  158
  159
  160
  161
  162
  163
  164
  165
  166

三、运行结果

四、matlab版本及参考文献

1 matlab版本
2014a

2 参考文献
[1]韩纪庆,张磊,郑铁然.语音信号处理（第3版）[M].清华大学出版社，2019.
[2]柳若边.深度学习:语音识别技术实践[M].清华大学出版社，2019.

文章来源: qq912100926.blog.csdn.net，作者：海神之光，版权归原作者所有，如需转载，请联系作者。

原文链接：qq912100926.blog.csdn.net/article/details/118229908

点赞
收藏
关注作者

作者其他文章

抱歉，系统识别当前为高风险访问，暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称，即可参与社区互动！

*长度不超过10个汉字或20个英文字符，设置后3个月内不可修改。

确认取消

开发者空间

了解空间

工作台

开发工具

实战案例

空间活动

空间论坛

开发平台

软件开发生产线 CodeArts

AI平台ModelArts

数据治理中心 DataArts Studio

数字内容生产线 MetaStudio

精选服务

云数据库 GaussDB

云数据库 RDS for MySQL

MapReduce服务 MRS

数据仓库服务 DWS

分布式缓存服务Redis版

分布式消息服务 DMS

华为云实时音视频 SparkRTC

媒体处理 MPC

主机迁移服务 SMS

对象存储迁移服务 OMS

云消息服务 KooMessage

云手机服务 KooPhone

企业搜索服务 KooSearch

云地图服务 KooMap

更多开放能力

开发工具

API生态

CodeArts API

API Explorer

Terraform Explorer

KooCLI

API 中心

SDK 中心

开发服务

Codelabs

DevStar

低代码平台Astro

CodeArts IDE

Huawei Cloud Toolkit

Classroom

开发资源

开源镜像站

开源资源

开发语言

开发实践

入门精选

分发资源

企业应用中心

企业通用专区

教育专区

华为应用专区

政务云专区

硬件云服务商城

医疗健康专区

跳蚤市场

华为云开发者日

直播专区

开发者精品活动

DTSE Tech Talk

加入HCDE

加入HCDG

加入HCSD

加入HCWD

鲁班会

沃土云创计划·企业

沃土云创计划·高校

沃土云创计划·个人

沃土云创计划·开源共创

博客

论坛

专题

开发者榜单

学习路径

在线课程

动手实验

考试认证

培训服务