I am making an Android-to-Android VoIP (loudspeaker) app using its AudioRecord and AudioTrack class, along with Speex via NDK to do echo cancellation. I was able to successfully pass into and retrieve data from Speex's speex_echo_cancellation() function, but the echo remains.
public MyThread(DatagramSocket socket, int frameSize, int filterLength){
this.socket = socket;
nativeMethod_initEchoState(frameSize, filterLength);
public void run(){
short[] audioShorts, recvShorts, recordedShorts, filteredShorts;
byte[] audioBytes, recvBytes;
int shortsRead;
DatagramPacket packet;
//initialize recorder and player
int samplingRate = 8000;
int managerBufferSize = 2000;
AudioTrack player = new AudioTrack(AudioManager.STREAM_MUSIC, samplingRate, AudioFormat.CHANNEL_OUT_MONO, AudioFormat.ENCODING_PCM_16BIT, managerBufferSize, AudioTrack.MODE_STREAM);
recorder = new AudioRecord(MediaRecorder.AudioSource.MIC, samplingRate, AudioFormat.CHANNEL_IN_MONO, AudioFormat.ENCODING_PCM_16BIT, managerBufferSize);
//record first packet
audioShorts = new short[1000];
shortsRead = recorder.read(audioShorts, 0, audioShorts.length);
//convert shorts to bytes to send
audioBytes = new byte[shortsRead*2];
//send bytes
packet = new DatagramPacket(audioBytes, audioBytes.length);
while (!this.isInterrupted()){
//recieve packet/bytes (received audio data should have echo cancelled already)
recvBytes = new byte[2000];
packet = new DatagramPacket(recvBytes, recvBytes.length);
//convert bytes to shorts
recvShorts = new short[packet.getLength()/2];
ByteBuffer.wrap(packet.getData(), 0, packet.getLength()).order(ByteOrder.LITTLE_ENDIAN).asShortBuffer().get(recvShorts);
//play shorts
player.write(recvShorts, 0, recvShorts.length);
//record shorts
recordedShorts = new short[1000];
shortsRead = recorder.read(recordedShorts, 0, recordedShorts.length);
//send played and recorded shorts into speex,
//returning audio data with the echo removed
filteredShorts = nativeMethod_speexEchoCancel(recordedShorts, recvShorts);
//convert filtered shorts to bytes
audioBytes = new byte[shortsRead*2];
//send off bytes
packet = new DatagramPacket(audioBytes, audioBytes.length);
}//end of while loop
下面是相关NDK / JNI code:
void nativeMethod_initEchoState(JNIEnv *env, jobject jobj, jint frameSize, jint filterLength){
echo_state = speex_echo_state_init(frameSize, filterLength);
jshortArray nativeMethod_speexEchoCancel(JNIEnv *env, jobject jObj, jshortArray input_frame, jshortArray echo_frame){
//create native shorts from java shorts
jshort *native_input_frame = (*env)->GetShortArrayElements(env, input_frame, NULL);
jshort *native_echo_frame = (*env)->GetShortArrayElements(env, echo_frame, NULL);
//allocate memory for output data
jint length = (*env)->GetArrayLength(env, input_frame);
jshortArray temp = (*env)->NewShortArray(env, length);
jshort *native_output_frame = (*env)->GetShortArrayElements(env, temp, 0);
//call echo cancellation
speex_echo_cancellation(echo_state, native_input_frame, native_echo_frame, native_output_frame);
//convert native output to java layer output
jshortArray output_shorts = (*env)->NewShortArray(env, length);
(*env)->SetShortArrayRegion(env, output_shorts, 0, length, native_output_frame);
//cleanup and return
(*env)->ReleaseShortArrayElements(env, input_frame, native_input_frame, 0);
(*env)->ReleaseShortArrayElements(env, echo_frame, native_echo_frame, 0);
(*env)->ReleaseShortArrayElements(env, temp, native_output_frame, 0);
return output_shorts;
这些code正常运行和音频数据肯定是被发送/接收/处理/从演奏机器人对机器人。的2000bytes / 1000shorts 8000赫兹和分组大小给定的音频采样率,我发现,1000的框架尺寸是必要的,对于所播放的音频是光滑。 (根据Speex的文档又名尾长度)最filterLength的值将运行,但似乎对回声去除没有影响
These code runs fine and audio data is definitely being sent/received/processed/played from android-to-android. Given audio sample rate of 8000 Hz and packet size of 2000bytes/1000shorts, I've found that a frameSize of 1000 is needed in order for the played audio to be smooth. Most value of filterLength (aka tail length according to Speex doc) will run, but seems to have no effect on the echo removal.
Does anyone understand enough AEC as to provide me some pointers on implementing or configuring Speex? Thanks for reading.
您正确对齐远端信号(你所说的recv)和近端信号(你叫什么记录)?总有一些需要被占回放/记录的等待时间。这通常需要在一段特定的时间段的环形缓冲区远端信号的缓冲。在PC上,这是通常约为50 - 120毫秒。在Android上我怀疑这是要高得多。大概在150〜 - 400毫秒。我会建议使用100毫秒taillength与Speex语音和调整您的远端缓冲区的大小,直到AEC收敛。这些变化应该允许AEC会聚列入的preprocessor,其中此处不需要独立,
Are you properly aligning the far-end signal (what you call recv) and near end signal (what you call record)? There is always some playback/record latency which needs to be accounted for. This generally requires buffering of the far-end signal in a ring buffer for some specified period of time. On PCs this is usually about 50 - 120ms. On Android I suspect it's much higher. Probably in the range of 150 - 400ms. I would recommend using a 100ms taillength with speex and adjusting the size of your far-end buffer until the AEC converges. These changes should allow the AEC to converge, independently of the inclusion of the preprocessor, which is not required here.