本文介绍了dplyr面具GGally并打破ggparcoord的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在新的会话中,
执行



<$ p $的文档中提供的 ggparcoord(。) p> 图书馆(GGally)

数据(diamonds,package =ggplot2)
diamonds.samp< - diamonds [sample(1:dim(diamonds) [1],100),]
ggparcoord(data = diamonds.samp,columns = c(1,5:10))

结果为以下情节:





再次,从一个新的会话开始,并使用加载的 执行相同的脚本

 code> library(GGally)
library(dplyr)

data(diamonds,package =ggplot2)
diamonds.samp< - diamonds [sample :dim(diamonds)[1],100),]
ggparcoord(data = diamonds.samp,columns = c(1,5:10))

结果:

请注意库(。)语句的顺序



问题


  1. 代码示例有问题吗?

  2. 有没有办法克服这个问题(通过某些命名空间函数)?

  3. 还是这个错误?

我需要 dplyr ggparcoord(。),但这个最小的例子反映了我面临的问题。



版本




  • R @ 3.2.3

  • dplyr @ 0.4.3

  • GGally @ 1.0.1

  • ggplot @ 2.0.0



更新



包装Joran提供的优秀答案:



答案


  1. 代码示例实际上是错误的,因为 ggparcoord(。) expec ts数据框架不是钻石数据集(如果加载dplyr)给出的一个 tbl_df 。

  2. 问题是通过将 tbl_df 胁迫到数据框架来解决。

  3. 不,这不是一个错误。

工作代码示例:

 图书馆(GGally)
库(dplyr)

数据(diamonds,package =ggplot2)
diamonds.samp< - diamonds [sample(1:dim )[1],100),]
ggparcoord(data = as.data.frame(diamonds.samp),columns = c(1,5:10))


解决方案

将我的评论转换为答案...



这里的GGally包正在做出合理的假设,即在数据框上使用 [应该按照它始终如一的方式运行。然而,这一切都在哈德利经文中,钻石数据集是一个 tbl_df 以及 data.frame



当加载 dplyr 时, [的行为被覆盖,以便 drop = FALSE 始终是 tbl_df 的默认值。因此,在 GGally 中有一个地方,其中 data [,cut] 预计将返回一个向量,而是返回另一个数据帧。 / p>

...具体来说,尝试执行时,您的示例中会抛出错误:

 code> data [,fact.var]<  -  as.numeric(data [,fact.var])。 

由于 data [,fact.var] 仍然是一个数据框,因此列表 as.numeric 将无法正常工作。



至于你的结论这不是一个bug,我会说....也许。大概。至少GGally 软件包作者应该怎么做才能解决这个问题。您只需要注意,使用非Hadley书面包的 tbl_df 可能会破坏事物。



As您注意到,删除额外的类属性可以修复问题,因为它将R返回到使用正常的 [方法。


Given a fresh session,executing a small ggparcoord(.) example provided in the documentation of the function

library(GGally)

data(diamonds, package="ggplot2")
diamonds.samp <- diamonds[sample(1:dim(diamonds)[1], 100), ]
ggparcoord(data = diamonds.samp, columns = c(1, 5:10))

results into the following plot:

Again, starting in a fresh session and executing the same script with the loaded dplyr

library(GGally)
library(dplyr)

data(diamonds, package="ggplot2")
diamonds.samp <- diamonds[sample(1:dim(diamonds)[1], 100), ]
ggparcoord(data = diamonds.samp, columns = c(1, 5:10))

results in:

Note that the order of the library(.) statements does not matter.

Questions

  1. Is there something wrong with the code samples?
  2. Is there a way to overcome the problem (over some namespace functions)?
  3. Or is this a bug?

I need both dplyr and ggparcoord(.) in a bigger analysis but this minimal example reflects the problem i am facing.

Versions

  • R @ 3.2.3
  • dplyr @ 0.4.3
  • GGally @ 1.0.1
  • ggplot @ 2.0.0

UPDATE

To wrap the excellent answer given by Joran up:

Answers

  1. The code samples are in fact wrong as ggparcoord(.) expects a data.frame not a tbl_df as given by the diamonds data set (if dplyr is loaded).
  2. The problem is solved by coercing the tbl_df to a data.frame.
  3. No it is not a bug.

Working code sample:

library(GGally)
library(dplyr)

data(diamonds, package="ggplot2")
diamonds.samp <- diamonds[sample(1:dim(diamonds)[1], 100), ]
ggparcoord(data = as.data.frame(diamonds.samp), columns = c(1, 5:10))
解决方案

Converting my comments to an answer...

The GGally package here is making the reasonable assumption that using [ on a data frame should behave the way it always does and always has. However, this all being in the Hadley-verse, the diamonds data set is a tbl_df as well as a data.frame.

When dplyr is loaded, the behavior of [ is overridden such that drop = FALSE is always the default for a tbl_df. So there's a place in GGally where data[,"cut"] is expected to return a vector, but instead it returns another data frame.

...specifically, the error is thrown in your example while attempting to execute:

data[, fact.var] <- as.numeric(data[, fact.var]).

Since data[,fact.var] remains a data frame, and hence a list, as.numeric won't work.

As for your conclusion that this isn't a bug, I'd say....maybe. Probably. At least there probably isn't anything the GGally package author ought to do to address it. You just have to be aware that using tbl_df's with non-Hadley written packages may break things.

As you noted, removing the extra class attributes fixes the problem, as it returns R to using the normal [ method.

这篇关于dplyr面具GGally并打破ggparcoord的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-05 21:00