问题描述
我在使用R时遇到了一个奇怪的问题,我在使用data.table:
I met a weird problem when I am using R, I'm using data.table:
在这里,当我尝试转换那些省份少于500更改为其他,输出将顶部计数的省份更改为索引号
Here, when I tried to convert those Province has count under 500 to "Other", the output changes the top count Provinces into index number
df <- fact_data[,.N,Province][N >= 500]$Province
df
fact_data[,Province := ifelse(Province %in% df, fact_data$Province, "Other")]
fact_data[,.N,Province][order(-N)]
输出:
但是,此方法对那些数值格式的因子变量效果很好。例如,如果我不使用省,而是使用BranchNumber,则值看起来像 1, 3,我得到的输入是这样的,这很不错:
But, this method worked well on those factor variables which values are in numeric format. For example, instead of using Province, if I use BranchNumber, the values look like "1", "3", I got the input like this, which is good:
您知道为什么会这样以及如何解决该问题吗?
Do you know, why this happened and how to resolve the problem?
推荐答案
这可能是 ifelse
的副作用,该习惯具有不可预期的改变其返回值的类的习惯。 。尝试以下操作:
This is probably a side effect of ifelse
, which has a bad habit of changing the class of its return value unpredictably. Try this instead:
fact_data[ !( Province %in% df ), Province := "Other" ]
通常,我建议尽可能使用字符向量作为data.table列,而不是因素。
Generally, I would recommend working with character vectors as data.table columns instead of factors whenever possible.
这篇关于R ifelse将因子值更改为索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!