我想比较两组患者(抗性vs.敏感)的蛋白质表达值(n = 465个蛋白质)。
我有11例耐药患者和8例敏感患者。我想比较(ttest)耐药组(A res
至K res
)的蛋白1和敏感组(L sens
至S sens
),蛋白2(耐药)与蛋白2的表达值(ttest) (敏感),依此类推。作为输出,我只需要p值
我尝试这样做(请参阅下文),但是出了点问题,我无法弄清楚是什么。
X Protein.1 Protein.2 Protein.3 Protein.4 Protein.5 Protein.6
1 A res 4127 16886 1785 1636 407 135
2 B res 10039 32414 3144 1543 601 154
3 C res 527 1059 1637 317 229 107
4 D res 553 3848 7357 1168 1549 441
5 E res 2351 2272 5868 2606 517 159
6 F res 822 1767 2110 818 293 75
7 G res 673 1887 511 471 214 NA
8 H res 5769 2206 2041 517 355 298
9 I res 1660 4221 1921 629 383 104
10 J res 3281 1804 2400 225 268 52
11 K res 3383 1882 1935 185 NA NA
12 L sens 10810 20136 2350 1143 527 160
13 M sens 5941 14873 3550 943 308 NA
14 N sens 1100 2325 1359 561 542 284
15 O sens 85 587 619 364 85 52
16 P sens 2321 6335 6494 994 NA NA
17 Q sens 103810 7102 7986 1464 439 187
18 R sens 1174 2076 1423 340 186 70
19 S sens 1829 973 1343 380 453 221
data <- read.csv("ProteinDataResSens.csv", sep=";", na.strings="weak", header=TRUE)
res <- data.frame(data[1:11, ], row.names=NULL)
colnames(res) <- paste("res", 1:length(res), sep="_")
sens <- data.frame(data[12:19, ], row.names=NULL)
colnames(sens) <- paste("sens", 1:length(sens), sep="_")
com <- combn(c(colnames(res), colnames(sens)), 2)
p <- apply(com, 2, function(x) t.test(data[, x[1]], data[, x[2]])$p.val)
data.frame(comparison=paste(com[1, ], com[2, ],sep=" vs."), p.value=p)
非常感谢您的帮助!
最佳答案
如果要针对每个res
列将sens
与Protein
进行比较
grp <- sub(".* ", "", df$X)
Pvals <- mapply(function(x,y) t.test(x[grp=='res'],
x[grp=='sens'])$p.value, df[,-1], list(grp))
Pvals[Pvals < 0.05]
或使用
data.table
library(data.table)
setDT(df)[, grp:= sub('.* ', "", X)][, lapply(.SD,
function(x) t.test(x[grp=='res'], x[grp=='sens'])$p.value),
.SDcols=2:(ncol(df)-1)]
数据
df <- structure(list(X = c("A res", "B res", "C res", "D res", "E res",
"F res", "G res", "H res", "I res", "J res", "K res", "L sens",
"M sens", "N sens", "O sens", "P sens", "Q sens", "R sens", "S sens"
), Protein.1 = c(4127L, 10039L, 527L, 553L, 2351L, 822L, 673L,
5769L, 1660L, 3281L, 3383L, 10810L, 5941L, 1100L, 85L, 2321L,
103810L, 1174L, 1829L), Protein.2 = c(16886L, 32414L, 1059L,
3848L, 2272L, 1767L, 1887L, 2206L, 4221L, 1804L, 1882L, 20136L,
14873L, 2325L, 587L, 6335L, 7102L, 2076L, 973L), Protein.3 = c(1785L,
3144L, 1637L, 7357L, 5868L, 2110L, 511L, 2041L, 1921L, 2400L,
1935L, 2350L, 3550L, 1359L, 619L, 6494L, 7986L, 1423L, 1343L),
Protein.4 = c(1636L, 1543L, 317L, 1168L, 2606L, 818L, 471L,
517L, 629L, 225L, 185L, 1143L, 943L, 561L, 364L, 994L, 1464L,
340L, 380L), Protein.5 = c(407L, 601L, 229L, 1549L, 517L,
293L, 214L, 355L, 383L, 268L, NA, 527L, 308L, 542L, 85L,
NA, 439L, 186L, 453L), Protein.6 = c(135L, 154L, 107L, 441L,
159L, 75L, NA, 298L, 104L, 52L, NA, 160L, NA, 284L, 52L,
NA, 187L, 70L, 221L)), .Names = c("X", "Protein.1", "Protein.2",
"Protein.3", "Protein.4", "Protein.5", "Protein.6"), class =
"data.frame", row.names = c("1", "2", "3", "4", "5", "6", "7", "8",
"9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19"))