问题描述
我想通过data.frame中的不同列(具有列名称中的常见模式)迭代函数。
子集的data.frame我使用这个代码工作:
df [,grep(abc,但是我不知道如何应用我的函数f(x),但是我不知道如何使用函数f(x)到所有匹配这个模式的列,使用for循环或lapply函数。
我使用的函数是:
compress = function(x){
aggregate(df [,x,drop = FALSE],
list(hour = with(df, (日期(时间),
sprintf(%d:00:00,小时(时间))))),
sum,na.rm = TRUE)
}
其中df(数据框)和Time可以被设置为变量本身,但是目前我不需要这样做。
感谢
Giulia解决方案你基本上已经知道了。只需在 apply 函数 f $ c>的子集数据的列上使用 apply (code> apply >第二个参数中的 2 )表示列,而不是 1 ,表示 apply over rows):
apply(df [,grep(abc,colnames(df))],2,f)
或者如果你不想强制你的 df 到矩阵(这将会发生应用),你可以用同样的方式使用 lapply ...
$ p $ lt; code> lapply(df [,grep(abc,colnames(df))],f)
从 lapply 的返回值是一个列表,每列有一个元素。您可以通过用数据包装 lapply 调用来将其重新转换为 data.frame .frame ,例如
$ b $ h
$ $ b
#此函数将其参数乘以2
f< - function(x)x * 2
df< - data .frame(AB = runif(5),AC = runif(5),BB = runif(5))
apply(df [,grep(A,colnames(df ))],2,f)
#AB AC
#[1,] 0.4130628 1.3302304
#[2,] 0.2550633 0.1896813
#[3,] 1.5066157 0.7679393
#[4,] 1.7900907 0.5487673
#[5,] 0.7489256 1.6292801
data.frame(lapply(df [,grep(A,colnames(df ))],f))
#AB AC
#1 0.4130628 1.3302304
#2 0.2550633 0.1896813
#3 1.5066157 0.7679393
#4 1.7900907 0.5487673
#5 0.7489256 1.6292801
#注意两个方法之间的重要区别...
class(data.frame(lapply(df [,grep(A,colnames(df)) ],f)))
#[1]data.frame
class(apply(df [,grep(A,colnames(df))],2,f))
#[1]matrix
第二次编辑
对于要运行的示例函数,可能会更容易把它重写为一个以 df 作为输入的函数,以及一个你想操作的列名向量。在这个例子中,函数返回一个列表,该列表的每个元素都包含一个聚合的 data.frame :
<$ p $ x $ {
$ (df,paste(日期(时间),
sprintf(%d:00:00,hours(Time))))),
sum,na.rm = TRUE)
$ $
$ b
运行函数然后你只要调用它,传递data.frame和一个colnames向量...
$ $ p $ compress(df,names( df)[grep(abc,names(df))])
I want to iterate a function through different columns (with a common pattern in the column names) of a data.frame. for subsetting the data.frame I use this code that works:
df[,grep("abc", colnames(df))]
but I don't know how to apply my function f(x) to all the columns that match this pattern, either using a for loop or lapply function.
the function I'm using is:
compress= function(x) { aggregate(df[,x,drop=FALSE], list(hour = with(df,paste(dates(Time), sprintf("%d:00:00",hours(Time))))), sum,na.rm=TRUE) }
where df (the data frame) and Time could be set as variables themselves but for the moment I don't need to do it.
ThanksGiulia
You've basically got it. Just use apply on the columns of your subsetted data to apply function f over columns (the 2 in the second argument of apply indicates columns, as opposed to 1 which indicates to apply over rows):
apply( df[,grep("abc", colnames(df))] , 2 , f )
Or if you don't want to coerce your df to a matrix (which will happen with apply) you can use lapply as you suggest in much the same manner...
lapply( df[,grep("abc", colnames(df))] , f )
The return value from lapply will be a list, with one element for each column. You can turn this back into a data.frame by wrapping the lapply call with a data.frame, e.g. data.frame( lapply(...) )
Example
# This function just multiplies its argument by 2 f <- function(x) x * 2 df <- data.frame( AB = runif(5) , AC = runif(5) , BB = runif(5) ) apply( df[,grep("A", colnames(df))] , 2 , f ) # AB AC #[1,] 0.4130628 1.3302304 #[2,] 0.2550633 0.1896813 #[3,] 1.5066157 0.7679393 #[4,] 1.7900907 0.5487673 #[5,] 0.7489256 1.6292801 data.frame( lapply( df[,grep("A", colnames(df))] , f ) ) # AB AC #1 0.4130628 1.3302304 #2 0.2550633 0.1896813 #3 1.5066157 0.7679393 #4 1.7900907 0.5487673 #5 0.7489256 1.6292801 # Note the important difference between the two methods... class( data.frame( lapply( df[,grep("A", colnames(df))] , f ) ) ) #[1] "data.frame" class( apply( df[,grep("A", colnames(df))] , 2 , f ) ) #[1] "matrix"
Second edit
For the example function you want to run, it might be easier to rewrite it as a function that takes the df as input and a vector of column names that you want to operate on. In this example the function returns a list, with each element of that list containing an aggregated data.frame:
compress= function( df , x ) { lapply( x , function(x){ aggregate(df[,x,drop=FALSE], list(hour = with(df,paste(dates(Time), sprintf("%d:00:00",hours(Time))))), sum,na.rm=TRUE) } ) }
To run the function you then just call it, passing it the data.frame and a vector of colnames...
compress( df , names(df)[ grep("abc", names(df) ) ] )
这篇关于通过匹配列名称中的模式的data.frame的不同列来迭代函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!