鸡尾酒会算法SVD实现

鸡尾酒会算法SVD实现

本文介绍了鸡尾酒会算法SVD实现...只需一行代码?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在斯坦福大学的安德鲁·伍(Andrew Ng)在Coursera的机器学习入门讲座的幻灯片中,他给出了鸡尾酒会问题的以下一行八度音阶解决方案,因为音频源是由两个空间分开的麦克风记录的:

In a slide within the introductory lecture on machine learning by Stanford's Andrew Ng at Coursera, he gives the following one line Octave solution to the cocktail party problem given the audio sources are recorded by two spatially separated microphones:

[W,s,v]=svd((repmat(sum(x.*x,1),size(x,1),1).*x)*x');

幻灯片的底部是来源:Sam Roweis,Yair Weiss和Eero Simoncelli",幻灯片的底部是由Te-Won Lee提供的音频剪辑".在视频中,吴教授说,

At the bottom of the slide is "source: Sam Roweis, Yair Weiss, Eero Simoncelli" and at the bottom of an earlier slide is "Audio clips courtesy of Te-Won Lee". In the video, Professor Ng says,

在视频讲座中单独播放的音频结果并不完美,但在我看来却是惊人的.有人对那一行代码的性能有何见解吗?特别是,没有人知道参考文献来解释Te-Won Lee,Sam Roweis,Yair Weiss和Eero Simoncelli在那一行代码方面的工作吗?

The separated audio results played in the video lecture are not perfect but, in my opinion, amazing. Does anyone have any insight on how that one line of code performs so well? In particular, does anyone know of a reference that explains the work of Te-Won Lee, Sam Roweis, Yair Weiss, and Eero Simoncelli with respect to that one line of code?

更新

为证明算法对麦克风分离距离的敏感性,下面的模拟(以八度为单位)将音调与两个空间分离的音调发生器分开.

To demonstrate the algorithm's sensitivity to microphone separation distance, the following simulation (in Octave) separates the tones from two spatially separated tone generators.

% define model
f1 = 1100;              % frequency of tone generator 1; unit: Hz
f2 = 2900;              % frequency of tone generator 2; unit: Hz
Ts = 1/(40*max(f1,f2)); % sampling period; unit: s
dMic = 1;               % distance between microphones centered about origin; unit: m
dSrc = 10;              % distance between tone generators centered about origin; unit: m
c = 340.29;             % speed of sound; unit: m / s

% generate tones
figure(1);
t = [0:Ts:0.025];
tone1 = sin(2*pi*f1*t);
tone2 = sin(2*pi*f2*t);
plot(t,tone1);
hold on;
plot(t,tone2,'r'); xlabel('time'); ylabel('amplitude'); axis([0 0.005 -1 1]); legend('tone 1', 'tone 2');
hold off;

% mix tones at microphones
% assume inverse square attenuation of sound intensity (i.e., inverse linear attenuation of sound amplitude)
figure(2);
dNear = (dSrc - dMic)/2;
dFar = (dSrc + dMic)/2;
mic1 = 1/dNear*sin(2*pi*f1*(t-dNear/c)) + \
       1/dFar*sin(2*pi*f2*(t-dFar/c));
mic2 = 1/dNear*sin(2*pi*f2*(t-dNear/c)) + \
       1/dFar*sin(2*pi*f1*(t-dFar/c));
plot(t,mic1);
hold on;
plot(t,mic2,'r'); xlabel('time'); ylabel('amplitude'); axis([0 0.005 -1 1]); legend('mic 1', 'mic 2');
hold off;

% use svd to isolate sound sources
figure(3);
x = [mic1' mic2'];
[W,s,v]=svd((repmat(sum(x.*x,1),size(x,1),1).*x)*x');
plot(t,v(:,1));
hold on;
maxAmp = max(v(:,1));
plot(t,v(:,2),'r'); xlabel('time'); ylabel('amplitude'); axis([0 0.005 -maxAmp maxAmp]); legend('isolated tone 1', 'isolated tone 2');
hold off;

在便携式计算机上执行大约10分钟后,模拟生成以下三个图形,说明两个孤立的音调具有正确的频率.

After about 10 minutes of execution on my laptop computer, the simulation generates the following three figures illustrating the two isolated tones have the correct frequencies.

但是,将麦克风间隔距离设置为零(即dMic = 0)会使模拟生成以下三个图形,说明模拟无法隔离第二个音调(由svd s中返回的单个有效对角线项确认)矩阵).

However, setting the microphone separation distance to zero (i.e., dMic = 0) causes the simulation to instead generate the following three figures illustrating the simulation could not isolate a second tone (confirmed by the single significant diagonal term returned in svd's s matrix).

我希望智能手机上的麦克风间距足够大,以产生良好的效果,但是将麦克风间距设置为5.25英寸(即dMic = 0.1333米)会导致模拟产生以下结果,但不尽人意,这些图说明了第一个隔离音中的高频分量.

I was hoping the microphone separation distance on a smartphone would be large enough to produce good results but setting the microphone separation distance to 5.25 inches (i.e., dMic = 0.1333 meters) causes the simulation to generate the following, less than encouraging, figures illustrating higher frequency components in the first isolated tone.

推荐答案

两年后,我也试图弄清楚这一点.但是我得到了答案.希望它将对某人有所帮助.

I was trying to figure this out as well, 2 years later. But I got my answers; hopefully it'll help someone.

您需要2录音.您可以从 http://research.ics.aalto.fi/ica/cocktail/cocktail_en中获得音频示例. cgi .

You need 2 audio recordings. You can get audio examples from http://research.ics.aalto.fi/ica/cocktail/cocktail_en.cgi.

实施参考是 http://www.cs.nyu.edu/~roweis/kica.html

好,这是代码-

[x1, Fs1] = audioread('mix1.wav');
[x2, Fs2] = audioread('mix2.wav');
xx = [x1, x2]';
yy = sqrtm(inv(cov(xx')))*(xx-repmat(mean(xx,2),1,size(xx,2)));
[W,s,v] = svd((repmat(sum(yy.*yy,1),size(yy,1),1).*yy)*yy');

a = W*xx; %W is unmixing matrix
subplot(2,2,1); plot(x1); title('mixed audio - mic 1');
subplot(2,2,2); plot(x2); title('mixed audio - mic 2');
subplot(2,2,3); plot(a(1,:), 'g'); title('unmixed wave 1');
subplot(2,2,4); plot(a(2,:),'r'); title('unmixed wave 2');

audiowrite('unmixed1.wav', a(1,:), Fs1);
audiowrite('unmixed2.wav', a(2,:), Fs1);

这篇关于鸡尾酒会算法SVD实现...只需一行代码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-21 11:44