问题描述
我的数据文件的结构如下:
I have data files structured like this:
OTU1 PIA0 1120
OTU2 PIA1 2
OTU2 PIA3 6
OTU2 PIA4 10
OTU2 PIA5 1078
OTU2 PIN1 24
OTU2 PIN2 45
OTU2 PIN3 261
OTU2 PIN4 102
OTU3 PIA0 16
OTU3 PIA1 59
OTU3 PIA2 27
OTU3 PIA3 180
OTU3 PIA4 200
OTU3 PIA5 251
OTU3 PIN0 36
OTU3 PIN1 61
OTU3 PIN2 156
OTU3 PIN3 590
OTU3 PIN4 277
OTU4 PIA0 401
OTU4 PIN0 2
我想创建一个矩阵,该矩阵显示第二列的数据组合,并以第一列作为组合计数的参考(显示按第一列号-OTU1,OTU2,OTU3进行测量的次数. ,OTU4-,则第二列中的每个基准点在同一OTU中一起显示).它需要看起来像这样:
And I want to create a matrix that shows combination of data from the second column taking the first column as reference for the counts of combination (showing how many times, measured each one by the first column number -OTU1, OTU2, OTU3, OTU4- each datum from the second column appears together with each other in the same OTU). It needs to look like this:
PIA0 PIA1 PIA2 PIA3 PIA4 PIA5 PIN0 PIN1 PIN2 PIN3 PIN4
PIA0 1 1 1 1 1 1 2 1 1 1 1
PIA1 1 0 1 2 2 2 1 2 2 2 2
PIA2 1 1 0 1 1 1 1 1 1 1 1
PIA3 1 2 1 0 2 2 1 2 2 2 2
PIA4 1 2 1 2 0 2 1 2 2 2 2
PIA5 1 2 1 2 2 0 1 2 2 2 2
PIN0 2 1 1 1 1 1 0 1 1 1 1
PIN1 1 2 1 2 2 2 1 0 2 2 2
PIN2 1 2 1 2 2 2 1 2 0 2 2
PIN3 1 2 1 2 2 2 1 2 2 0 2
PIN4 1 2 1 2 2 2 1 2 2 2 0
在具有相同名称的行和列之间共享的数据反映了该数据在OTU中单独出现的次数.
Data shared between a row and a column with the same name reflects the number of times this datum appears alone in an OTU.
有什么想法吗?
我已经阅读了有关R库'reshape2'和命令'acast'的信息,但是我只能更改其中包含所有数据的矩阵的形状,而不能根据需要进行组合计数.我也一直在考虑一个Biopython脚本,但是如果我对编程的一点了解的话,将它写下来将会太大而且很难.
I have read about R libraries 'reshape2' and command 'acast' here, but with that I can only change the shape of a matrix with all data in it, not make combination counts as desired. I have also been thinking about a Biopython script, but I think it would be too big and difficult to write it down with my little knowledge about programming.
目标是建立一个类似于示例中的矩阵,这样我就可以运行 CIRCOS在线使用这些数据进行编程.
The goal is to build a matrix like the one in the example so I can run CIRCOS online program with these data.
推荐答案
您可以使用dcast
创建一个二进制矩阵,指示每个OTU中每个PI的存在,然后将其自身相乘以得到计数./p>
You can use dcast
to create a binary matrix indicating the presence of each PI inside each OTU, and then multiply it by itself to have the counts.
d <- read.fwf( textConnection("
OTU1 PIA0 1120
OTU2 PIA1 2
OTU2 PIA3 6
OTU2 PIA4 10
OTU2 PIA5 1078
OTU2 PIN1 24
OTU2 PIN2 45
OTU2 PIN3 261
OTU2 PIN4 102
OTU3 PIA0 16
OTU3 PIA1 59
OTU3 PIA2 27
OTU3 PIA3 180
OTU3 PIA4 200
OTU3 PIA5 251
OTU3 PIN0 36
OTU3 PIN1 61
OTU3 PIN2 156
OTU3 PIN3 590
OTU3 PIN4 277
OTU4 PIA0 401
OTU4 PIN0 2"), widths=c(8,8,10), header=FALSE, skip=1 )
library(reshape2)
A <- as.matrix( dcast( V1 ~ V2, data=d, length )[,-1]>0 )
# PIA0 PIA1 PIA2 PIA3 PIA4 PIA5 PIN0 PIN1 PIN2 PIN3 PIN4
# [1,] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# [2,] FALSE TRUE FALSE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE
# [3,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
# [4,] TRUE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
t(A) %*% A
# PIA0 PIA1 PIA2 PIA3 PIA4 PIA5 PIN0 PIN1 PIN2 PIN3 PIN4
# PIA0 3 1 1 1 1 1 2 1 1 1 1
# PIA1 1 2 1 2 2 2 1 2 2 2 2
# PIA2 1 1 1 1 1 1 1 1 1 1 1
# PIA3 1 2 1 2 2 2 1 2 2 2 2
# PIA4 1 2 1 2 2 2 1 2 2 2 2
# PIA5 1 2 1 2 2 2 1 2 2 2 2
# PIN0 2 1 1 1 1 1 2 1 1 1 1
# PIN1 1 2 1 2 2 2 1 2 2 2 2
# PIN2 1 2 1 2 2 2 1 2 2 2 2
# PIN3 1 2 1 2 2 2 1 2 2 2 2
# PIN4 1 2 1 2 2 2 1 2 2 2 2
这篇关于将数据帧转换为带计数的矩阵的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!