本文介绍了R {ff}:如何在ffdf对象中添加依赖于同一行中其他元素的新列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个ffdf对象(23Mx4)和一个带有值"TUMOR"或"NORMAL"的字符向量,并且每个值都有一个名称,一个唯一的icgc_specimen_id,因此我用这种方式指出某个标本是正常细胞还是肿瘤细胞.

I have an ffdf objetct (23Mx4) and a character vector with the values "TUMOR" or "NORMAL" and each value has a name, an unique icgc_specimen_id, so this way I indicate if a certain specimen is a Normal cell or Tumor cell.

> head(expresion,4)
ffdf (all open) dim=c(23939146,4), dimorder=c(1,2) row.names=NULL
ffdf virtual mapping
                               PhysicalName VirtualVmode PhysicalVmode  AsIs VirtualIsMatrix PhysicalIsMatrix PhysicalElementNo
icgc_donor_id                 icgc_donor_id      integer       integer FALSE           FALSE            FALSE                 1
icgc_specimen_id           icgc_specimen_id      integer       integer FALSE           FALSE            FALSE                 2
gene_id                             gene_id      integer       integer FALSE           FALSE            FALSE                 3
normalized_read_count normalized_read_count       double        double FALSE           FALSE            FALSE                 4
                      PhysicalFirstCol PhysicalLastCol PhysicalIsOpen
icgc_donor_id                        1               1           TRUE
icgc_specimen_id                     1               1           TRUE
gene_id                              1               1           TRUE
normalized_read_count                1               1           TRUE
ffdf data
         icgc_donor_id icgc_specimen_id      gene_id normalized_read_count
1         DO3868           SP8217       SERINC1               9.276133e-05
2         DO3868           SP8217       SERINC2               1.925742e-04
3         DO3868           SP8217       SERINC3               2.531452e-05
4         DO3868           SP8217       SERINC4               4.811070e-07
5         DO3868           SP8217       SERINC5               4.402422e-07
6         DO3868           SP8217       SERP1                 7.620133e-05
7         DO3868           SP8217       SNX13                 1.088022e-05
8         DO3868           SP8217       SNX10                 5.652351e-06
:                    :                :            :                     :
23939139  DO2341           SP5052       FCRLB                 8.290500e-07
23939140  DO2341           SP5052       FDFT1                 7.108729e-05
23939141  DO2341           SP5052       FDPSL2A               7.999602e-08
23939142  DO2341           SP5052       GRIPAP1               6.532955e-05
23939143  DO2341           SP5052       GRINL1A               1.156511e-05
23939144  DO2341           SP5052       GRIP1                 2.465546e-06
23939145  DO2341           SP5052       GRIP2                 1.486814e-06
23939146  DO2341           SP5052       GRK1                  1.678295e-08
> head(specimen_type)
SP3358  SP6685 SP12716  SP8109 SP12780  SP8097 
"TUMOR" "TUMOR" "TUMOR" "TUMOR" "TUMOR" "TUMOR" 

我想在ffdf中添加一列称为sp_type,以便在每一行中了解我是在研究肿瘤还是正常细胞.

I want to add a column to the ffdf called sp_type to know in each row if I'm working on a Tumor or a Normal cell.

在正常数据帧中,我会这样做:

In a normal data frame I would do:

expresion$sp_type <- specimen_type[expresion$icgc_specimen_id]

我找不到在ffdf对象中执行相同操作的方法.

I can't find a way to do the same in an ffdf object.

推荐答案

我会这样写:

require(ETLUtils)
require(ffbase)
expresion$sp_type <- with(expresion[c('icgc_specimen_id')], 
 recoder(as.character(icgc_specimen_id), from = names(specimen_type), to = specimen_type))

这篇关于R {ff}:如何在ffdf对象中添加依赖于同一行中其他元素的新列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-19 00:52