我使用这个库https://code.google.com/p/libmfcc/从量值平方功率谱生成MFCC系数。
然而,据我所知,第一个系数应该代表总能量。我的结果不是这样的。这让我怀疑整个功能集。
F0:-3.77,F1:-2.78,F2:2.13,F3:4.47,F4:2.76,F5:-0.00,F6:-0.58,F7:0.76,F8:1.49,F9:0.62,F10:-0.44,F11:-0.26,F12:0.58
这些是应用升降机之前的原始MFCC功能。
我想要这个系数的唯一原因是帮助消除特性是我的项目中的问题。-我传入一个256个实数长的幅度功率谱(最初是512fft),采样频率为16000hz。我很确定FFT是正确的,因为我已经运行了测试来检查生成的频率。
我正试图使用这些功能来执行说话人识别,但目前我不断得到误报。我试过将生成的特征与神经网络、矢量量化以及简单的bruteforce欧几里德和spearman的比较结合使用。我所做的一切似乎都不能证明声音之间系数的唯一性。以假阳性结束。
我已经被困在这几个月了,我有一种感觉,这是我的特点的错误。
任何帮助都将不胜感激!

最佳答案

您的fft值不寻常,以下是来自语音帧的fft示例:
12406.376 317135.746 995981.334 62626224.382 200626224.382 2005596.535353535353518142.702 1183111.796 1183111.796 1866254.816 1866258.816 352858.721 340289.386 676789.386 676767139.24310894041.353 5132132132132132132132132132132151321321515.387 3216815151515151515.387 321747473731.584.584 22194241.584 229444141414.07241.072 3673880.07313131313131313131313131313131313131313131313131313131313131743.522 3417229.415 2512512.261 546054.633 2096752.63712437009.121 70430.472 165724247224.6191288489.1288489.91915992.292929292929292929292929292929292929292929292929292928282887.811 576691.811 576691.932 4629292929292929292929292929292929292929292929292929292929292929292929292929292929292929292828282887.49494949494949494949494949494949494949494949494949492929292929292929292929292929292929292929292929292929292929292929292929818181819191919191919199.756 8493450.137 8647922.201 1814417.128 652202.156934195.600 7234344.850 59959552.325525252781.731 94066.862 94066.862 24987.524 24987.524 300704.365 14786.379 38961.829 38961.829 2525.752 457.993 16805.918 21014.001 25724.770 64765.894 31916.899 31916.339 5772.0557 26097.1991497.997.984 15884.304 5949494949494949494959599552.32559955252525252.7378781.731.731.731 949494061 94061 94061.94066.866.866.866.862 657.454 5423.333 6252.982 26137.014 8101.993 23840.536 96350.180155396.746.746 111640640.103 67379.170 191046.191191191046.213 53822.422.423 199623.933.939 199623.939.939.939 521401.332.332 240488.616 26096.616 26096.585.586.585 279.739 56939.739.739 56934.077 33565.473 17344.580584.597.597 2779.577.2778.5774.7274.464.464 61239.311 13451.726 5151511.726 5151.936 5192.935.539.5151515157.676767379.6767399.014.134 42059.955 11662.442 534.955 13736.420 13481.058 48308.51033231.743 12317.196 12317.196 48160.791 115668.115668.828 211469.849.841 1633739.245 35339.914 47919.914.914 47145.795 37257.3357.3357.3357.7565.769 9065.769 759.759.579 8339.8439.91919 359.919 8419.709 1819 1815.682 1017.977 64.2177 64.21517 17711.483 2531.483 2531 15.887 2431.31313131313131313131313131313131313131313131313131313131313131313131313131313131313131313131501 5620.484 18436.224 27086.375 31720.334 42472.198 143007.3061358588.920 8743433.057 108255.923 101891.40173553.40173553.860 73585858588.920858588.92085858588.920.080843434343433.053131125.666675.4141414 75971.49499 75971.499 23780.864 780.864.4949494949494949499 78413.973 68413.973 240216.066 148102.903 19623.293 8194.448 2725.753 3217.438 2725.753 3213.461 32131.461.461 60279.461 60279.038 16768.038 2168.0368.0368.906 3 42455.636 20263.426 973.230 2763.689 1136.641 5300.404 3128.7632635.018 15487.226 15487.226 16915.816 5757.816 5770.127 5770.127 4770.1274770.27116645.31313957.323 13957.322 13957.322 13929.323 13908.576.576.576 2281.975 63947.522 50887.522 50889.739.733 13957.118 18690 18690.955 12249.632 1006.608 12672.938 12672.938 4463.555 4463.555 4663.559 4693.099 2049.099 2048 2048.688 1488 14887.160 16965.22665.3665.033 16915 16915 16915.816.816.816.815.038 976.106 8155.822 26873.908 44851.560 30956.4657607.291 4517.811 25642.189 22606.560 12422.574 44612.224 74799.536 25034.774 197.800 2410.775 237.717 3106.175 7980.360 3960.008 8073.620 31488.422 8950.003 3459.935 666.708 7.372
另外,我担心你在写“每句话的快速傅立叶变换”。讲话必须逐窗分析,而不是为了整个讲话。你需要先在窗户上分离信号。

08-05 09:20