本文介绍了用ddply概括字符值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下数据框架:

  df<  -  structure(list(year = c,1986L,1987L,1991L ,1991L,1991L,1991L,1992L,1992L,1992L,1992L,1992L,1992L,1993L,1993L,1993L,1993L,1993L,1993L,1993L,1993L,1993L,1993L,1993L,1993L,1993L,1993L, ,1994L,1994L,1994L,1994L,1994L,1994L,1994L,1994L,1994L,1994L,1994L,1994L,1994L,1994L,1994L,1994L,1994L,1994L,1994L,1994L),knmilocatie = c,4L,16L,10L,12L,9L,20L,12L,12L,25L,9L,30L,26L,22L,18L,15L,24L,13L,31L,27L,5L,3L,19L,21L,23L, 20L,20L,20L,26L,26L,31L,35L,25L,11L,28L,8L,29L,36L,34L,7L,28L,17L,14L,33L,1L,11L,6L,32L,27L,29L, 2L),.Label = c(Achterdiep,Annen,Appingedam,Assen,Bedum,De Klip,Delfzijl,Eenrum,Eleveld,Emmen Garnethuizen,Geelbroek,Haren,Hellum,Hoogezand,Hooghalen,Kolham,Langelo,Loppersum,Middelstum,Nijenklooster,Noordbroek,Oldenzijl ,Overschild,Roswinkel,Slochteren,Stedum,Steendam,t-Zandt,Ten Boer,Ten Post,Uithuizermeeden,Weiwerd,Westerbroek Winneweer,Zandeweer),class =factor),baglocatie = structure(c(2L,12L,5L,4L,2L,17L,11L,2L,21L,2L,16L,35L,27L,14L,22L ,19L,33L,34L,26L,17L,1L,18L,1L,28L,6L,25L,25L,29L,9L,21L,10L,19L,34L,15L,36L,13L,7L,19L,8L,23L (Appingedam,Assen,Bedum,Ekehaar,Emmen,Eppenhuizen,7L,31L,17L,1L,20L,3L,10L,32L,30L,24L) ,Farmsum,Froombosch,Garrelsweer,Garsthuizen,Geelbroek,Hooghalen,Kolham,Langelo,Leermens,Loppersum,Middelstum,Oosterwijtwerd Overschild,Roodeschool,Roswinkel,Sappemeer,Schildwolde,Schipborg,Slochteren,Stedum,Steendam,t-Zandt,Ten Post,Toornwerd Tripscompagnie,Warffum,Westerbroek,Wirdum,Woudbloem,Zandeweer),class =factor),lllocatie = structure(c(3L,13L,5L,10L,4L,32L, 10L,10L,22L,4L,36L,37L,31L,15L,23L,20L,34L,8L,24L,35L,19L,19L,2L,29L,26L,25L,25L,30L,8L,22L, 20L,19L,16L,38L,12L,6L,27L,7L,11L,17L,33L,14L,2L,21L,18L,9L,28L,32L,1L),.Label = c(Annen,Appingedam ,Assen,Eleveld,Emmen,Farmsum,Froombosch,Garrelsweer,Garsthuizen,Geelbroek,Hellum,Hoogezand,Hooghalen,Huizinge Langelo,Leermens,Meedhuizen,Onderdendam,Oosterwijtwerd,Overschild,Roodeschool,Roswinkel,Sappemeer,Sint Annen,Slochteren,Startenhuizen Steendam,Stitswerd,t-Zandt,Ten Post,Tjuchem,Toornwerd,Tripscompagnie,Westerbroek,Westerwijtwerd,Winneweer,Woudbloem,Zandeweer) ,class =factor)).Names = c(year,knmilocatie,baglocatie,lllocatie),class =data.frame,row.names = c(NA, )

我想按年份总结。对于每一年我需要的实例数量 baglocatie!= knmilocatie baglocatie!= lllocatie 和<$ c $

 



unequal< - ddply(df,。(year),summarize,
bag.knmi = nrow(df [as.character(df $ baglocatie)!= as.character(df $ knmilocatie ),]),
bag.ll = nrow(df [as.character(df $ baglocatie)!= as.character(df $ lllocatie),]),
ll.knmi = nrow [as.character(df $ lllocatie)!= as.character(df $ knmilocatie),])


b $ b

但是,没有返回所需的结果。对于每一年,它给出了整个数据帧的总计。我也尝试了 length 而不是 nrow ,但是没有工作。



所需结果应如下所示:

  year bag.knmi bag.ll ll.knmi 
1986 0 0 0
1987 0 0 0
1991 2 3 1
1992 4 3 2



此外,我想知道这个问题是否可以用dplyr解决。



<$ p>

$ p> unequal< - ddply(df,。(year),summarize,
bag.knmi = sum(ascharacter(baglocatie)!= as.character(knmilocatie)
bag.ll = sum(as.character(baglocatie)!= as.character(lllocatie)),
ll.knmi = sum(ascharacter(lllocatie)!= as.character(knmilocatie) )

之后的所有内容总结在每条数据的上下文中进行评估。如果你明确的引用原始数据框架中的列,那么你将得到:整个数据框架,而不是零件。



dplyr 中执行:

  df%>%
group_by(year)%>%
summarize(bag.knmi = sum(as.character(baglocatie)!= as.character(knmilocatie)))


I have the following dataframe:

df <- structure(list(year = c(1986L, 1987L, 1991L, 1991L, 1991L, 1991L, 1992L, 1992L, 1992L, 1992L, 1992L, 1992L, 1993L, 1993L, 1993L, 1993L, 1993L, 1993L, 1993L, 1993L, 1993L, 1993L, 1993L, 1993L, 1993L, 1993L, 1993L, 1993L, 1994L, 1994L, 1994L, 1994L, 1994L, 1994L, 1994L, 1994L, 1994L, 1994L, 1994L, 1994L, 1994L, 1994L, 1994L, 1994L, 1994L, 1994L, 1994L, 1994L, 1994L, 1994L), knmilocatie = structure(c(4L, 16L, 10L, 12L, 9L, 20L, 12L, 12L, 25L, 9L, 30L, 26L, 22L, 18L, 15L, 24L, 13L, 31L, 27L, 5L, 3L, 19L, 21L, 23L, 20L, 26L, 26L, 31L, 35L, 25L, 11L, 28L, 8L, 29L, 36L, 34L, 7L, 28L, 17L, 14L, 33L, 1L, 11L, 6L, 32L, 27L, 29L, 20L, 20L, 2L), .Label = c("Achterdiep", "Annen", "Appingedam", "Assen", "Bedum", "De Klip", "Delfzijl", "Eenrum", "Eleveld", "Emmen", "Garsthuizen", "Geelbroek", "Haren", "Hellum", "Hoogezand", "Hooghalen", "Kolham", "Langelo", "Loppersum", "Middelstum", "Nijenklooster", "Noordbroek", "Oldenzijl", "Overschild", "Roswinkel", "Slochteren", "Stedum", "Steendam", "t-Zandt", "Ten Boer", "Ten Post", "Uithuizermeeden", "Weiwerd", "Westerbroek", "Winneweer", "Zandeweer"), class = "factor"), baglocatie = structure(c(2L, 12L, 5L, 4L, 2L, 17L, 11L, 2L, 21L, 2L, 16L, 35L, 27L, 14L, 22L, 19L, 33L, 34L, 26L, 17L, 1L, 18L, 1L, 28L, 6L, 25L, 25L, 29L, 9L, 21L, 10L, 19L, 34L, 15L, 36L, 13L, 7L, 19L, 8L, 23L, 7L, 31L, 17L, 1L, 20L, 3L, 10L, 32L, 30L, 24L), .Label = c("Appingedam", "Assen", "Bedum", "Ekehaar", "Emmen", "Eppenhuizen", "Farmsum", "Froombosch", "Garrelsweer", "Garsthuizen", "Geelbroek", "Hooghalen", "Kolham", "Langelo", "Leermens", "Loppersum", "Middelstum", "Oosterwijtwerd", "Overschild", "Roodeschool", "Roswinkel", "Sappemeer", "Schildwolde", "Schipborg", "Slochteren", "Stedum", "Steendam", "t-Zandt", "Ten Post", "Toornwerd", "Tripscompagnie", "Warffum", "Westerbroek", "Wirdum", "Woudbloem", "Zandeweer"), class = "factor"), lllocatie = structure(c(3L, 13L, 5L, 10L, 4L, 32L, 10L, 10L, 22L, 4L, 36L, 37L, 31L, 15L, 23L, 20L, 34L, 8L, 24L, 35L, 19L, 19L, 2L, 29L, 26L, 25L, 25L, 30L, 8L, 22L, 9L, 20L, 19L, 16L, 38L, 12L, 6L, 27L, 7L, 11L, 17L, 33L, 14L, 2L, 21L, 18L, 9L, 28L, 32L, 1L), .Label = c("Annen", "Appingedam", "Assen", "Eleveld", "Emmen", "Farmsum", "Froombosch", "Garrelsweer", "Garsthuizen", "Geelbroek", "Hellum", "Hoogezand", "Hooghalen", "Huizinge", "Langelo", "Leermens", "Meedhuizen", "Onderdendam", "Oosterwijtwerd", "Overschild", "Roodeschool", "Roswinkel", "Sappemeer", "Sint Annen", "Slochteren", "Startenhuizen", "Steendam", "Stitswerd", "t-Zandt", "Ten Post", "Tjuchem", "Toornwerd", "Tripscompagnie", "Westerbroek", "Westerwijtwerd", "Winneweer", "Woudbloem", "Zandeweer"), class = "factor")), .Names = c("year", "knmilocatie", "baglocatie", "lllocatie"), class = "data.frame", row.names = c(NA, -50L))

I want to summarise it by year. For each year I need the number of instances were baglocatie != knmilocatie, baglocatie != lllocatie and lllocatie != knmilocatie.

I tryed:

unequal <- ddply(df, .(year), summarise,
                 bag.knmi = nrow(df[as.character(df$baglocatie) != as.character(df$knmilocatie),]),
                 bag.ll = nrow(df[as.character(df$baglocatie) != as.character(df$lllocatie),]),
                 ll.knmi = nrow(df[as.character(df$lllocatie) != as.character(df$knmilocatie),])
                 )

However that did not return the desired result. For each year, it gives the totals for the whole dataframe. I also tryed length instead of nrow, but that didn't work either. What am I missing?

The desired result should look like:

year  bag.knmi  bag.ll  ll.knmi
1986  0         0       0
1987  0         0       0
1991  2         3       1
1992  4         3       2

Additionally I like to know whether this problem can be solved with dplyr as well.

解决方案

You're just not using summarise correctly:

unequal <- ddply(df, .(year), summarise,
                 bag.knmi = sum(as.character(baglocatie) != as.character(knmilocatie)),
                 bag.ll = sum(as.character(baglocatie) != as.character(lllocatie)),
                 ll.knmi = sum(as.character(lllocatie) != as.character(knmilocatie))
                 )

Everything after summarise is evaluated within the context of each piece of your data. If you explicitly refer to columns in the original data frame, that's what you'll get: the whole data frame, not the pieces.

And yes, of course this can be done in dplyr as well:

df %>% 
    group_by(year) %>% 
    summarise(bag.knmi = sum(as.character(baglocatie) != as.character(knmilocatie)))

这篇关于用ddply概括字符值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-23 08:21