问题描述
我有一个数据框,其中包含一个长字符串,每个字符串都与一个样本相关联:
I have a data frame that contains a long character string each associated with a 'Sample':
Sample Data
1 000000000000000000000000000N01000000000000N0N000000000N00N0000NN00N0N000000100000N00N0N0000000NNNN011111111111111111111111111111110000000000000000000N000000N0000000000N
2 000000000000000000000000000N01000000000000N0N000000000N00N0000NN00N0N000000100000N00N0N0000000NNNN011111111111111111111111111111110000000000000000000N000000N0000000000N
我想码的简单的方法来打破这种字符串分成以下格式的5个部分:
I would like to code an easy way to break this string into 5 pieces in the following format:
Sample X
CCT6 - Characters 1-33
GAT1 - Characters 34-68
IMD3 - Characters 69-99
PDR3 - Characters 100-130
RIM15 - Characters 131-168
为每个样本提供如下所示的输出:
Giving an output that looks like this for each sample:
Sample 1
CCT6 - 000000000000000000000000000N01000
GAT1 - 000000000N0N000000000N00N0000NN00N0
IMD3 - N000000100000N00N0N0000000NNNN0
PDR3 - 1111111111111111111111111111111
RIM15 - 0000000000000000000N000000N0000000000N
$ b $ >我已经能够使用 substr
函数将长字符串分成单个片段,但是id希望能够将其自动化,因此我可以在一个输出中获得全部5个片段。理想情况下,此输出也将是一个数据帧。
I've been able to use the substr
function to break the long string into individual pieces but id like to able to automate it so I can get all 5 pieces in one output. Ideally this output would also be a data frame.
推荐答案
这就是?read.fwf
用于。
首先是一些看起来像您的问题的数据:
First some data which looks like your question:
x <- data.frame(Sample = c(1, 2), Data = c("000000000000000000000000000N01000000000000N0N000000000N00N0000NN00N0N000000100000N00N0N0000000NNNN011111111111111111111111111111110000000000000000000N000000N0000000000N",
"000000000000000000000000000N01000000000000N0N000000000N00N0000NN00N0N000000100000N00N0N0000000NNNN011111111111111111111111111111110000000000000000000N000000N0000000000N"),
stringsAsFactors = FALSE)
现在使用 read.fwf
,指定每个字段的宽度及其名称,然后都应为字符
模式。我们将示例数据的文本列包装在 textConnection
中,以便将其视为 read通常理解的连接。*
等功能。
Now use read.fwf
, specify the widths of each field and their names, and that all should be of mode character
. We wrap the text column of the example data in textConnection
so that we can treat it like a connection understood generally by the read.*
and other functions.
(strs <- read.fwf(textConnection(x$Data), widths = c(33, 35, 31, 31, 38), colClasses = "character", col.names = c("CCT6", "GAT1", "IMD3", "PDR3", "RIM15")))
CCT6 GAT1 IMD3 PDR3 RIM15
1 000000000000000000000000000N01000 000000000N0N000000000N00N0000NN00N0 N000000100000N00N0N0000000NNNN0 1111111111111111111111111111111 0000000000000000000N000000N0000000000N
2 000000000000000000000000000N01000 000000000N0N000000000N00N0000NN00N0 N000000100000N00N0N0000000NNNN0 1111111111111111111111111111111 0000000000000000000N000000N0000000000N
现在循环遍历各行,并按照示例打印每行:
Now loop over the rows and print out each one as per your example:
for (i in 1:nrow(strs)) {
writeLines(paste("Sample", i))
writeLines(paste(names(strs), strs[i, ], sep = " - "))
}
赠予,例如:
Sample 2
CCT6 - 000000000000000000000000000N01000
GAT1 - 000000000N0N000000000N00N0000NN00N0
IMD3 - N000000100000N00N0N0000000NNNN0
PDR3 - 1111111111111111111111111111111
RIM15 - 0000000000000000000N000000N0000000000N
这篇关于将字符串分成不同行的多个字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!