本文介绍了合并具有重复ID的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我想合并和累加包含重复ID的每一行的值.
I would like to merge and sum the values of each row that contains duplicated IDs.
例如,下面的数据框包含重复的符号"LOC102723897".我想合并这两行并将每一列中的值求和,以便为重复的符号显示一行.
For example, the data frame below contains a duplicated symbol 'LOC102723897'. I would like to merge these two rows and sum the value within each column, so that one row appears for the duplicated symbol.
> head(y$genes)
SM01 SM02 SM03 SM04 SM05 SM06 SM07 SM08 SM09 SM10 SM11 SM12 SM13 SM14 SM15 SM16 SM17 SM18 SM19 SM20 SM21 SM22
1 32 29 23 20 27 105 80 64 83 80 94 58 122 76 78 70 34 32 45 42 138 30
2 246 568 437 343 304 291 542 457 608 433 218 329 483 376 410 296 550 533 537 473 296 382
3 30 23 30 13 20 18 23 13 31 11 15 27 36 21 23 25 26 27 37 27 31 16
4 1450 2716 2670 2919 2444 1668 2923 2318 3867 2084 1121 2175 3022 2308 2541 1613 2196 1851 2843 2078 2180 1902
5 288 366 327 334 314 267 550 410 642 475 219 414 679 420 425 308 359 406 550 398 399 268
6 34 59 62 68 42 31 49 45 62 51 40 32 30 39 41 75 54 59 83 99 37 37
SM23 SM24 SM25 SM26 SM27 SM28 SM29 SM30 Symbol
1 41 23 57 160 84 67 87 113 LOC102723897
2 423 535 624 304 568 495 584 603 LINC01128
3 31 21 49 13 33 31 14 31 LINC00115
4 2453 3041 3590 2343 3450 3725 3336 3850 NOC2L
5 403 347 468 478 502 563 611 577 LOC102723897
6 45 51 56 107 79 105 92 131 PLEKHN1
> dim(y)
[1] 12928 30
我尝试使用plyr
来基于符号"列合并行,但是它不起作用.
I attempted using plyr
to merge rows based on the 'Symbol' column, but it's not working.
> ddply(y$genes,"Symbol",numcolwise(sum))
> dim(y)
[1] 12928 30
> length(y$genes$Symbol)
[1] 12928
> length(unique(y$genes$Symbol))
[1] 12896
推荐答案
您在Symbol
和sum all
列上进行分组.
You group-by on Symbol
and sum all
columns.
library(dplyr)
df %>% group_by(Symbol) %>% summarise_all(sum)
使用data.table
library(data.table)
setDT(df)[ , lapply(.SD, sum),by="Symbol"]
这篇关于合并具有重复ID的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!