本文介绍了计算“组特征",然后将其计算出来.没有ddply和合并的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道是否有比我通常采用的方法更直接的方法来计算某种类型的变量.

I wonder whether there is a more straighforward way to calculate a certain type of variables than the approach i normally take....

下面的示例可能最好地说明了这一点.我有一个2列的数据框(水果以及水果是否烂).我想为每一行添加例如相同类别的水果烂掉的百分比.例如,苹果有4个条目,其中2个是烂的,因此苹果的每一行都应读取0.5.目标值(仅作为说明)包含在所需结果"列中.

The example below probably explains it best. I have a dataframe with 2 columns (fruit and whether the fruit is rotten or not). I would like to, for each row, add e.g. the percentage of fruit of the same category that is rotten. For example, there are 4 entries for apples, 2 of them are rotten, so each row for apple should read 0.5. The target values (purely as illustration) are included in the "desired outcome" column.

我以前通过以下方式解决了这个问题*在水果变量上使用"ddply"命令(以sum/lenght作为函数),创建一个新的3 * 2数据帧*使用合并"命令将这些值链接回旧数据框.

I have previously approached this problem by * using the "ddply" command on the fruit variable (with sum/lenght as function), creating a new 3*2 dataframe * use the "merge" command to link these values back into the old dataframe.

这感觉就像是一个回旋处,我想知道是否有更好/更快的方法!理想地是一种通用方法,如果需要用一个而不是百分比来确定例如是否为零,则可以容易地进行调整.

This feels like a roundabout way, and I was wondering whether there are better/faster way of doing this! ideallly a generic approach, that is easily adjusted if one instead of the percentage needs to determine whether e.g. all fruits are rotten, any fruits are rotten, etc. etc. etc....

在此先感谢

W

    Fruit Rotten Desired_Outcome_PercRotten
1   Apple      1                        0.5
2   Apple      1                        0.5
3   Apple      0                        0.5
4   Apple      0                        0.5
5    Pear      1                       0.75
6    Pear      1                       0.75
7    Pear      1                       0.75
8    Pear      0                       0.75
9  Cherry      0                          0
10 Cherry      0                          0
11 Cherry      0                          0

#create example datagram; desired outcome columns are purely inserted as illustrative of target outcomes
Fruit=c(rep("Apple",4),rep("Pear",4),rep("Cherry",3))
Rotten=c(1,1,0,0,1,1,1,0,0,0,0)
Desired_Outcome_PercRotten=c(0.5,0.5,0.5,0.5,0.75,0.75,0.75,0.75,0,0,0)
df=as.data.frame(cbind(Fruit,Rotten,Desired_Outcome_PercRotten))        
df

推荐答案

您可以只用ddplymutate来做到这一点:

You can do this with just ddply and mutate:

# changed summarise to transform on joran's suggestion
# changed transform to mutate on mnel's suggestion :)
ddply(df, .(Fruit), mutate, Perc = sum(Rotten)/length(Rotten))

#     Fruit Rotten Perc
# 1   Apple      1 0.50
# 2   Apple      1 0.50
# 3   Apple      0 0.50
# 4   Apple      0 0.50
# 5  Cherry      0 0.00
# 6  Cherry      0 0.00
# 7  Cherry      0 0.00
# 8    Pear      1 0.75
# 9    Pear      1 0.75
# 10   Pear      1 0.75
# 11   Pear      0 0.75

这篇关于计算“组特征",然后将其计算出来.没有ddply和合并的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-24 13:08