问题描述
此
df = data.frame(c(-2,-1,1,2),NA)
colnames(df)<-c( values, pos_neg)
标志<-with(df,values< 0)
df $ pos_neg [flag] = negative
df $ pos_neg [!flag] = positive
给我这个
它可以按预期工作。问题是我不确定如何或为什么这样做。如果将布尔值放在方括号中,会发生什么情况?到目前为止,我认为数据帧是一个数组,我只能按数字( df [1]
)或按名称(如果可用)访问值( df [ pants]
)。
预先感谢!
在值不是全部 NA
后,查看子集是否容易一些:
df<-data.frame(values = c(-2,-1,1, 2),
pos_neg = NA)
标志<-df $ values< 0
df $ pos_neg [flag]<-负数
df $ pos_neg [!flag]<-正数
这里的第一个重要概念是数据框是变量的列表(具有类,一些限制和许多方法,但仍然是列表)( 列),而不是二维数组(矩阵)。因此, $
或 [[
子集会提取单个变量,它是单个向量,所以
df $ pos_neg
#> [1]负负正正
您可以使用逻辑向量,因此逻辑子集的工作方式与 c('a','b')[c(FALSE TRUE)]
一样:
df $ pos_neg [flag]
#> [1]负负
df $ pos_neg [!flag]
#> [1]正正
使用 分配给这些子集的方法在这里起作用,因为您提供的是长度为1的向量,该向量将被回收以适合该子集。 b $ b
在数据帧上使用带有两个参数(用于行和列)的
[
子集,例如 df [2:3,'values']
在某些方面更复杂,即使从矩阵类似物来看更直观。特别是 [。data.frame
方法默认情况下为 drop = TRUE
,这可能会使它不清楚返回另一个数据帧或向量。在大多数情况下,这无关紧要,但这可能会导致程序用法出现错误。
使用 [
子集在数据帧上具有单个参数,例如 df [1]
的作用类似于 [
对列表的处理,按名称,索引或逻辑掩码设置列,总是返回相同类别的另一个列表(即另一个数据框)。
This
df = data.frame(c(-2,-1,1,2), NA)
colnames(df) <- c("values", "pos_neg")
flag <- with(df, values < 0)
df$pos_neg[flag] = "negative"
df$pos_neg[!flag] = "positive"
gives me this
And it works as intended. The problem is that I'm not really sure how or why it does. What happens exactly if I put a boolean value into the brackets? Up to now I thought a dataframe is an array and I can access values only by number (df[1]
) or by name if available (df["pants"]
).
Thanks in advance!
解决方案 It's a little easier to see if you look at the subsetting after the values are not all NA
:
df <- data.frame(values = c(-2,-1,1,2),
pos_neg = NA)
flag <- df$values < 0
df$pos_neg[flag] <- "negative"
df$pos_neg[!flag] <- "positive"
The first important concept here is that a data frame is a list (with a class, some restrictions and lots of methods, but still a list) of variables ("columns"), not a two-dimensional array (a matrix). Thus, $
or [[
subsetting pulls out a single variable, which is a single vector, so
df$pos_neg
#> [1] "negative" "negative" "positive" "positive"
You can subset any vector with a logical vector, so logical subsetting works just like c('a', 'b')[c(FALSE TRUE)]
does:
df$pos_neg[flag]
#> [1] "negative" "negative"
df$pos_neg[!flag]
#> [1] "positive" "positive"
Using <-
to assign to those subsets works here because you are supplying a length-1 vector that is getting recycled to fit the subset.
Using [
subsetting with two parameters (for rows and columns) on a data frame, e.g. df[2:3, 'values']
is in some regards more complicated, even if more intuitive from the matrix analogue. In particular, the [.data.frame
method defaults to drop = TRUE
, which can make it unclear if it will return another data frame or a vector. Most of the time this doesn't matter, but it can cause bugs in programmatic usages.
Using [
subsetting with a single parameter on a data frame, e.g. df[1]
, works like [
does on a list, subsetting columns by name, index, or logical mask and always returning another list of the same class (i.e. another data frame).
这篇关于R编程:dataframe $ column [< boolean>] =< value>工作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!