问题描述
我在R中有一个由104列组成的数据框,看起来像这样:
I have a dataframe in R consisting of 104 columns, appearing as so:
id vcr1 vcr2 vcr3 sim_vcr1 sim_vcr2 sim_vcr3 sim_vcr4 sim_vcr5 sim_vcr6 sim_vcr7
1 2913 -4.782992840 1.7631999 0.003768704 1.376937 -2.096857 6.903021 7.018855 6.135139 3.188382 6.905323
2 1260 0.003768704 3.1577108 -0.758378208 1.376937 -2.096857 6.903021 7.018855 6.135139 3.188382 6.905323
3 2912 -4.782992840 1.7631999 0.003768704 1.376937 -2.096857 6.903021 7.018855 6.135139 3.188382 6.905323
4 2914 -1.311132669 0.8220594 2.372950077 -4.194246 -1.460474 -9.101704 -6.663676 -5.364724 -2.717272 -3.682574
5 2915 -1.311132669 0.8220594 2.372950077 -4.194246 -1.460474 -9.101704 -6.663676 -5.364724 -2.717272 -3.682574
6 1261 2.372950077 -0.7022792 -4.951318264 -4.194246 -1.460474 -9.101704 -6.663676 -5.364724 -2.717272 -3.682574
"sim_vcr *"变量贯穿sim_vcr100
The "sim_vcr*" variables go all the way through sim_vcr100
我需要在一张图中包含两条重叠的密度密度曲线,看起来像这样(除了这里您看到的是5而不是2):
I need two overlapping density density curves contained within one plot, looking something like this (except here you see 5 instead of 2):
我需要一条密度曲线来包含vcr1,vcr2和vcr3列中包含的所有值,并且我需要另一条密度曲线,其中包含所有sim_vcr *列中的所有值(所以100列,sim_vcr1-sim_vcr100)
I need one of the density curves to consist of all values contained in columns vcr1, vcr2, and vcr3, and I need another density curve containing all values in all of the sim_vcr* columns (so 100 columns, sim_vcr1-sim_vcr100)
由于两条曲线重叠,因此它们必须是透明的,就像在所附的图像中一样.我知道使用 ggplot
命令有一种非常简单的方法来执行此操作,但是我在语法上遇到了麻烦,并且无法正确定向我的数据框,以便每个直方图都可以从适当的位置提取.列.
Because the two curves overlap, they need to be transparent, like in the attached image. I know that there is a pretty straightforward way to do this using the ggplot
command, but I am having trouble with the syntax, as well as getting my data frame oriented correctly so that each histogram pulls from the proper columns.
非常感谢您的帮助.
推荐答案
使用 df
作为您在帖子中提到的数据,您可以尝试以下操作:
With df
being the data you mentioned in your post, you can try this:
用下一个代码分隔数据帧,然后绘制:
Separate dataframes with next code, then plot:
library(tidyverse)
library(gdata)
#Index
i1 <- which(startsWith(names(df),pattern = 'vcr'))
i2 <- which(startsWith(names(df),pattern = 'sim'))
#Isolate
df1 <- df[,c(1,i1)]
df2 <- df[,c(1,i2)]
#Melt
M1 <- pivot_longer(df1,cols = names(df1)[-1])
M2 <- pivot_longer(df2,cols = names(df2)[-1])
#Plot 1
ggplot(M1) + geom_density(aes(x=value,fill=name), alpha=.5)
#Plot 2
ggplot(M2) + geom_density(aes(x=value,fill=name), alpha=.5)
更新
对一个情节使用下一个代码:
Use next code for one plot:
#Unique plot
#Melt
M <- pivot_longer(df,cols = names(df)[-1])
#Mutate
M$var <- ifelse(startsWith(M$name,'vcr',),'vcr','sim_vcr')
#Plot 3
ggplot(M) + geom_density(aes(x=value,fill=var), alpha=.5)
这篇关于使用ggplot绘制两条重叠的密度曲线的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!