我想拆分字符。虽然我有一个很大的数据框可以工作,但下面的小例子展示了需要做的事情。

  mydf <- data.frame (name = c("L1", "L2", "L3"),
    M1 = c("AC", "AT", NA), M2 = c("CC", "--", "TC"), M3 = c("AT", "TT", "AG"))

我想拆分变量 M1 到 M3 的字符(在实际数据集中我有 > 6000 个变量)
  name  M1a M1b   M2a M2b  M3a  M3b
   L1   A    C    C    C    A     T
   L2   A    T    -    -    T     T
   L3   NA   NA   T     C    A     G

我尝试了以下代码:
func<- function(x) {sapply( strsplit(x, ""),
                     match, table= c("A","C","T","G", "--", NA))}

odataframe <- data.frame(apply(mydf, 1, func) )
colnames(odataframe) <-  paste(rep(names(mydf), each = 2), c("a", "b"), sep = "")
odataframe

最佳答案

干得好:

splitCol <- function(x){
  x <- as.character(x)
  x[is.na(x)] <- "$$"
  z <- matrix(unlist(strsplit(x, split="")), ncol=2, byrow=TRUE)
  z[z=="$"] <- NA
  z
}


newdf <- as.data.frame(do.call(cbind, lapply(mydf[, -1], splitCol)))
names(newdf) <- paste(rep(names(mydf[, -1]), each=2), c("a", "b"), sep="")
newdf <- data.frame(mydf[, 1, drop=FALSE], newdf)

newdf
  name  M1a  M1b M2a M2b M3a M3b
1   L1    A    C   C   C   A   T
2   L2    A    T   -   -   T   T
3   L3 <NA>  <NA   T   C   A   G

关于r - 在列和名称中拆分字符,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/7972771/

10-12 17:46
查看更多