本文介绍了将一列拆分为多列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有第一列的表:

chr10:100002872-100002872
chr10:100003981-100003981
chr10:100004774-100004774
chr10:100005285-100005285
chr10:100007123-100007123

我想将其转换为3个单独的列,但无法为使用的strsplit命令定义:"和-".我该怎么办?

I want to convert it to 3 separate columns but I couldn't define ":" and "-" to used strsplit command.What should I do?

推荐答案

这是一种方法:

library(data.table)
DF[, paste0("V1.",1:3) ] <- tstrsplit(DF$V1, ":|-")

#                          V1  V1.1      V1.2      V1.3
# 1 chr10:100002872-100002872 chr10 100002872 100002872
# 2 chr10:100003981-100003981 chr10 100003981 100003981
# 3 chr10:100004774-100004774 chr10 100004774 100004774
# 4 chr10:100005285-100005285 chr10 100005285 100005285
# 5 chr10:100007123-100007123 chr10 100007123 100007123

正如@AnandaMahto所说,

strsplit接受涉及或"运算符|的正则表达式. tstrsplit只是data.table包添加的便捷功能.

strsplit accepts regular expressions involving the "or" operator, |, as @AnandaMahto said. tstrsplit is just a convenience function added by the data.table package.

如果将data.frame转换为data.table(它具有许多优点,但除了学习曲线稍有缺点,就没有缺点),您可以这样做:

If you convert your data.frame to a data.table (which has many advantages and no disadvantages except a slight learning curve), you would do:

setDT(DF)[, paste0("V1.",1:3) := tstrsplit(V1, ":|-")]

#                           V1  V1.1      V1.2      V1.3
# 1: chr10:100002872-100002872 chr10 100002872 100002872
# 2: chr10:100003981-100003981 chr10 100003981 100003981
# 3: chr10:100004774-100004774 chr10 100004774 100004774
# 4: chr10:100005285-100005285 chr10 100005285 100005285
# 5: chr10:100007123-100007123 chr10 100007123 100007123


替代方法.有一些(麻烦的)方法可以在基数R中获得相同的结果,例如


Alternatives. There are (cumbersome) ways to get the same thing in base R, like

DF[, paste0("V1.",1:3) ] <- do.call(rbind, strsplit(DF$V1, ":|-"))

@AnandaMahto的软件包对此也具有便利功能:

And @AnandaMahto's package also has a convenience function for this:

library(splitstackshape)
cSplit(DF, "V1", ":|-")
#     V1.1      V1.2      V1.3                      V1_1
# 1: chr10 100002872 100002872 chr10:100002872-100002872
# 2: chr10 100003981 100003981 chr10:100003981-100003981
# 3: chr10 100004774 100004774 chr10:100004774-100004774
# 4: chr10 100005285 100005285 chr10:100005285-100005285
# 5: chr10 100007123 100007123 chr10:100007123-100007123

这篇关于将一列拆分为多列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-09 13:08