问题描述
上个月我一直在学习 R.
I've been getting up to speed with R in the last month.
这是我的问题:
在 ggplot2 中为具有稳定映射的分类变量分配颜色的好方法是什么?我需要在一组具有不同子集和不同数量的分类变量的图表中使用一致的颜色.
What is a good way to assign colors to categorical variables in ggplot2 that have stable mapping? I need consistent colors across a set of graphs that have different subsets and different number of categorical variables.
例如,
plot1 <- ggplot(data, aes(xData, yData,color=categoricaldData)) + geom_line()
其中 categoricalData
有 5 个级别.
where categoricalData
has 5 levels.
然后
plot2 <- ggplot(data.subset, aes(xData.subset, yData.subset,
color=categoricaldData.subset)) + geom_line()
其中 categoricalData.subset
有 3 个级别.
where categoricalData.subset
has 3 levels.
但是,两个集合中的特定关卡最终会以不同的颜色结束,这使得一起阅读图表变得更加困难.
However, a particular level that is in both sets will end up with a different color, which makes it harder to read the graphs together.
我需要在数据框中创建颜色矢量吗?还是有其他方法可以将特定颜色分配给类别?
Do I need to create a vector of colors in the data frame? Or is there another way to assigns specific colors to categories?
推荐答案
对于像 OP 中的确切示例这样的简单情况,我同意 Thierry 的答案是最好的.但是,我认为指出另一种方法会很有用,当您尝试在多个数据帧之间保持一致的配色方案时,这些方法会变得更容易,这些数据帧并非都是通过子集单个大数据帧获得的.如果从单独的文件中提取多个数据框中的因子水平并且并非所有因子水平都出现在每个文件中,那么管理多个数据框中的因子水平可能会变得乏味.
For simple situations like the exact example in the OP, I agree that Thierry's answer is the best. However, I think it's useful to point out another approach that becomes easier when you're trying to maintain consistent color schemes across multiple data frames that are not all obtained by subsetting a single large data frame. Managing the factors levels in multiple data frames can become tedious if they are being pulled from separate files and not all factor levels appear in each file.
解决此问题的一种方法是创建自定义手动色标,如下所示:
One way to address this is to create a custom manual colour scale as follows:
#Some test data
dat <- data.frame(x=runif(10),y=runif(10),
grp = rep(LETTERS[1:5],each = 2),stringsAsFactors = TRUE)
#Create a custom color scale
library(RColorBrewer)
myColors <- brewer.pal(5,"Set1")
names(myColors) <- levels(dat$grp)
colScale <- scale_colour_manual(name = "grp",values = myColors)
然后根据需要在绘图上添加色标:
and then add the color scale onto the plot as needed:
#One plot with all the data
p <- ggplot(dat,aes(x,y,colour = grp)) + geom_point()
p1 <- p + colScale
#A second plot with only four of the levels
p2 <- p %+% droplevels(subset(dat[4:10,])) + colScale
第一个情节是这样的:
第二个情节是这样的:
这样您就不需要记住或检查每个数据框以查看它们是否具有适当的级别.
This way you don't need to remember or check each data frame to see that they have the appropriate levels.
这篇关于如何为ggplot2中具有稳定映射的分类变量分配颜色?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!