本文介绍了在 dplyr mutate 调用中添加多列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个带有点分隔字符列的数据框:

I have a data frame with a dot-separated character column:

> set.seed(310366)
> tst = data.frame(x=1:10,y=paste(sample(c("FOO","BAR","BAZ"),10,TRUE),".",sample(c("foo","bar","baz"),10,TRUE),sep=""))
> tst
    x       y
1   1 BAR.baz
2   2 FOO.foo
3   3 BAZ.baz
4   4 BAZ.foo
5   5 BAZ.bar
6   6 FOO.baz
7   7 BAR.bar
8   8 BAZ.baz

并且我想将该列拆分为两个新列,其中包含点两侧的部分.stringr 包中的 str_split_fixed 可以很好地完成这项工作.我所有的值肯定是由一个点分隔的两部分,所以我可以这样做:

and I want to split that column into two new columns containing the parts on either side of the dot. str_split_fixed from package stringr can do the job quite nicely. All my values are definitely two parts separated by a dot so I can do:

> require(stringr)
> str_split_fixed(tst$y,"\.",2)
      [,1]  [,2]
 [1,] "BAR" "baz"
 [2,] "FOO" "foo"
 [3,] "BAZ" "baz"
 [4,] "BAZ" "foo"
 [5,] "BAZ" "bar"
 [6,] "FOO" "baz"
 [7,] "BAR" "bar"

现在我可以将它 cbind 连接到我的数据框,但我想我会弄清楚如何在 dplyr 管道中做到这一点.首先我认为 mutate 可以做到:

Now I could just cbind that to my data frame but I thought I'd figure out how to do that in a dplyr pipeline. First I thought mutate could do it in one:

> tst %.% mutate(parts=str_split_fixed(y,"\.",2))
Error: wrong result size (20), expected 10 or 1

我可以让 mutate 分两次完成:

I can get mutate to do it in two:

> tst %.% mutate(part1=str_split_fixed(y,"\.",2)[,1], part2=str_split_fixed(y,"\.",2)[,2])
    x       y part1 part2
1   1 BAR.baz   BAR   baz
2   2 FOO.foo   FOO   foo
3   3 BAZ.baz   BAZ   baz
4   4 BAZ.foo   BAZ   foo
5   5 BAZ.bar   BAZ   bar
6   6 FOO.baz   FOO   baz

但那是将字符串拆分两次.

but that's running the string split twice.

到目前为止我能以 dplyr 的方式做的最好的"是这样的(这是我在写这个问题时才发现的......):

"Best" I can do so far in a dplyr way is this (which I only discovered while writing this question...):

> tst %.% do(cbind(.,data.frame(parts=str_split_fixed(.$y,"\.",2))))
    x       y parts.1 parts.2
1   1 BAR.baz     BAR     baz
2   2 FOO.foo     FOO     foo
3   3 BAZ.baz     BAZ     baz
4   4 BAZ.foo     BAZ     foo
5   5 BAZ.bar     BAZ     bar

这还不错,但是在 R 中失去了管道事物的很多可读性.是否有一种使用 mutate 的简单方法我错过了?

which isn't bad, but loses a lot of the readability of piped things in R. Is there a simple approach using mutate that I've missed?

推荐答案

您可以将 tidyr 中的 separate()dplyr 结合使用:

You can use separate() from tidyr in combination with dplyr:

tst %>% separate(y, c("y1", "y2"), sep = "\.", remove=FALSE)

    x       y  y1  y2
1   1 BAR.baz BAR baz
2   2 FOO.foo FOO foo
3   3 BAZ.baz BAZ baz
4   4 BAZ.foo BAZ foo
5   5 BAZ.bar BAZ bar
6   6 FOO.baz FOO baz
7   7 BAR.bar BAR bar
8   8 BAZ.baz BAZ baz
9   9 FOO.bar FOO bar
10 10 BAR.foo BAR foo

设置 remove=TRUE 将删除列 y

这篇关于在 dplyr mutate 调用中添加多列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-31 09:10