问题描述
我得到的错误率高达20个值,这可能是什么原因? k_values:[1、2、3、4、5、6、7、8、9、10、11、12、13、14、15、16、17、18、19、20]错误[0.0、0.0、0.0、0.0、0.0、0.0、0.0020000000000000018、0.0020000000000000018、0.0020000000000000018、0.0020000000000000018、0.0020000000000000018、0.0020000000000000018、0.006000000000000005、0.0040000000000000036、0.008000000000000007、0.006000000000000005、0.010000000000000009、0.008000000000000007、0.014000000000000012、0.01200000000000001这是我的测试错误率
I am getting the error rates like this up to 20 values what might be the reason for this ? k_values: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]Error [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0020000000000000018, 0.0020000000000000018, 0.0020000000000000018, 0.0020000000000000018,0.0020000000000000018, 0.0020000000000000018, 0.006000000000000005, 0.0040000000000000036, 0.008000000000000007,0.006000000000000005, 0.010000000000000009, 0.008000000000000007, 0.014000000000000012, 0.01200000000000001]these are my testing error rates
我想知道错误率随k值增加而增加的原因吗?
I want to know the reason why the error rate increases with increase in k values?
推荐答案
K值越高,数据集中的多数类对结果的发言权就越大,因此错误率增加
What happens with higher value of K is that ,the Majority Class in the Dataset has a bigger say on the outcome of the result ,So the error rate increases
假设有100个数据点,假设80个属于类标签"0",而20个属于类标签"1"
Let's say that there are 100 data points , and let's say that 80 belong to class label "0" and 20 belong to class label "1"
现在,如果我选择k> 40的任何值,则所有数据点现在将属于多数类
Now , if I choose any value of k > 40 , all the datapoints will now belong to the majority class
通常,较大的K值会导致拟合不足,同时较小的K值(尽管因特定问题而定)会导致拟合过度
Generally, Large value of K leads to Underfitting at the sametime very small value of K(though problem specific) leads to Overfitting
这篇关于为什么在KNN算法中,当K值较高时,测试错误率会增加?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!