问题描述
我试图根据第 2 列和第 3 列是否包含 0 来删除行.我不断得到非常奇怪的结果.我最初尝试在没有 subset
的情况下编写它,因为我在某处读到 subset
应该只用于少量数据,因为内存成本.然而,这两种尝试都不适合我.有人可以解释我做错了什么吗?
I am trying to remove rows based on whether or not columns 2 and 3 contain 0's. I keep getting very strange results. I tried to write it without subset
initially because I read somewhere that subset
should only be used for small amounts of data because of the memory cost. Neither attempt worked for me however. Can someone explain what I did wrong?
df <- data.frame(val1=c(1,2,3), val2=c(4,0,5), val3=c(3,0,6))
subset(df,df>0,c(2,3))
data.frame(df[df[,c(2,3)]!=0])
起始数据帧:
val1 val2 val3
1 1 4 3
1 2 0 0
3 3 5 6
最终目标:
val1 val2 val3
1 1 4 3
3 3 5 6
推荐答案
使用 subset
,我们创建了基于第二和第三列的逻辑索引.
Using the subset
, we create a logical index based on the 2nd and third columns.
subset(df, subset=!(val2==0|val3==0))
as subset
参数适用于列而不是矩阵.我们也可以使用 [
而不是 subset
.
as subset
argument works on columns and not on matrices.We can also use [
instead of subset
.
df[!(df[,2]==0|df[,3]==0),]
关于 OP 帖子中的第二个答案
Regarding the second answer in the OP's post
df[,c(2,3)]!=0 #returns a matrix
# val2 val3
#[1,] TRUE TRUE
#[2,] FALSE FALSE
#[3,] TRUE TRUE
对于行子集,我们只需要每行一个逻辑索引.
For subsetting rows, we need only a single logical index per each row.
另一个选项是 rowSums
(如果您想删除第 2 列和第 3 列均为 0 的行)
Another option is rowSums
(if you want to remove rows that are 0 for both column 2 and 3)
df[rowSums(df[2:3])!=0,]
即
df$val3[2] <- 2
将返回带有 rowSums
的所有行,而其他方法返回第 1 行和第 3 行.
will return all the rows with rowSums
while the other methods return rows 1 and 3.
与 subset
等效的选项是 &
subset(df, !(val2==0 & val3==0))
这篇关于根据多列值获取数据子集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!