我有一个data.table,其中的一列包含“Exp 928.6.3(DMSO)”之类的文本。我想将其解析为“Exp 928”和“6.3”之类的列。似乎强大的data.table应该可以快速完成此工作,但我不知道如何按照自己的意愿进行调整。有什么想法吗?
谢谢,
詹姆士
> dput(head(dat))
structure(list(Experiment = c("Exp 927.1.1 (DMSO)", "Exp 927.1.2 (DMSO)",
"Exp 927.1.3 (DMSO)", "Exp 927.1.4 (DMSO)", "Exp 927.1.5 (DMSO)",
"Exp 927.1.6 (DMSO)"), Conc.1..LP9. = c("Failed", "Failed", "Failed",
"Failed", "Failed", "0.97"), Conc.2..LP11. = c("Failed", "Failed",
"Failed", "Failed", "Failed", "0.87"), Conc.3..LP13. = c("Failed",
"Failed", "Failed", "Failed", "Failed", "0.81"), Conc.4..LP15. = c("Failed",
"Failed", "Failed", "Failed", "Failed", "0.76"), Conc.5..LP17. = c("Failed",
"Failed", "Failed", "Failed", "Failed", "0.58"), Conc.1.uM..µM. = c("Failed",
"Failed", "Failed", "Failed", "Failed", "0.001"), Conc.2.uM..µM. = c("Failed",
"Failed", "Failed", "Failed", "Failed", "0.01"), Conc.3.uM..µM. = c("Failed",
"Failed", "Failed", "Failed", "Failed", "0.1"), Conc.4.uM..µM. = c("Failed",
"Failed", "Failed", "Failed", "Failed", "1"), Conc.5.uM..µM. = c("Failed",
"Failed", "Failed", "Failed", "Failed", "10"), exptNo = list(
"927", "1", "1", "927", "1", "1"), sample = c("927", "1",
"2", "927", "1", "2"), replicate = c("927", "1", "3", "927",
"1", "3")), .Names = c("Experiment", "Conc.1..LP9.", "Conc.2..LP11.",
"Conc.3..LP13.", "Conc.4..LP15.", "Conc.5..LP17.", "Conc.1.uM..µM.",
"Conc.2.uM..µM.", "Conc.3.uM..µM.", "Conc.4.uM..µM.", "Conc.5.uM..µM.",
"exptNo", "sample", "replicate"), sorted = "Experiment", class = c("data.table",
"data.frame"), row.names = c(NA, -6L), .internal.selfref = <pointer: 0x0000000000130788>)
最佳答案
我认为有更简单的解决方案,
dat[,do.call(rbind,
strsplit(gsub( "(.*?)[.](.*) .*","\\1|\\2",Experiment),'[|]'))]
[,1] [,2]
[1,] "Exp 927" "1.1"
[2,] "Exp 927" "1.2"
[3,] "Exp 927" "1.3"
[4,] "Exp 927" "1.4"
[5,] "Exp 927" "1.5"
[6,] "Exp 927" "1.6"
关于R data.table文本解析,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/21465752/