本文介绍了R / ggplot直方图中的累积和的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述 29岁程序员,3月因学历无情被辞! 我有一个包含用户ID和他们创建的对象数量的数据集。我使用ggplot绘制了直方图,现在我试图将x值的累积和作为一行。目标是看到很多垃圾箱对总数的贡献。我尝试了以下方法: ggplot(data = userStats,aes(x = Num_Tours))+ geom_histogram(binwidth = 0.2)+ scale_x_log10(name ='计划行程数',休息= c(1,5,10,50,100,200))+ geom_line(aes(x = Num_Tours,y = cumsum(Num_Tours)/ sum Num_Tours)* 3500),color =red)+ scale_y_continuous(name ='Number of users',sec.axis = sec_axis(〜。/ 3500,name =累计路线百分比[%]) ) 这是行不通的,因为我没有包含任何垃圾箱,所以剧情 和 ggplot(data = userStats,aes(x = Num_Tours))+ geom_histogram (binwidth = 0.2)+ scale_x_log10(name ='计划行程数',break = c(1,5,10,50,100,200))+ stat_bin(aes(y = cumsum(.. count ..)),binwidth = 0.2,geom =line,color =red)+ scale_y_continuous(name ='Number of users',sec.axis = sec_axis(〜。/ 3500,name =)累积百分比的路线[%])) 导致: 。 这里考虑计数的cumsum。我想要的是bin的count *值的cumsum。然后它应该正常化,以便它可以显示在一个图中。我想要的是这样的: 如果有任何输入,我将不胜感激!感谢 编辑: 作为测试数据,这应该是正常的: userID Num_Tours userStats< - data.frame(userID,Num_Tours ) userStats $ cumulative< - cumsum(userStats $ Num_Tours / sum(userStats $ Num_Tours)) 解决方案这是一个说明性的例子,可以帮助您。 set .seed(111) userID Num_Tours userStats< - data.frame (用户ID,Num_Tours) #排序x数据 userStats $ Num_Tours< - sort(userStats $ Num_Tours) userStats $ cumulative< - cumsum(userStats $ Num_Tours / sum (userStats $ Num_Tours)) library(ggplot2)#手动修复y轴的最大值 ymax ggplot(data = userStats ,aes(x = Num_Tours))+ geom_histogram(binwidth = 0.2,col =white)+ scale_x_log10(nam e ='计划行程数',中断= c(1,5,10,50,100,200))+ geom_line(aes(x = Num_Tours,y =累积* ymax),col =红色,lwd = 1)+ scale_y_continuous(name ='Number of users',sec.axis = sec_axis(〜。/ ymax, name =累计路线百分比[%])) I have a dataset with user IDs and the number of objects they created. I drew the histogram using ggplot and now I'm trying to include the cumulative sum of the x-values as a line. The aim is to see much the bins contribute to the total number. I tried the following:ggplot(data=userStats,aes(x=Num_Tours)) + geom_histogram(binwidth = 0.2)+ scale_x_log10(name = 'Number of planned tours',breaks=c(1,5,10,50,100,200))+ geom_line(aes(x=Num_Tours, y=cumsum(Num_Tours)/sum(Num_Tours)*3500),color="red")+ scale_y_continuous(name = 'Number of users', sec.axis = sec_axis(~./3500, name = "Cummulative percentage of routes [%]"))This does not work because I don't include any bins so the plotand ggplot(data=userStats,aes(x=Num_Tours)) + geom_histogram(binwidth = 0.2)+ scale_x_log10(name = 'Number of planned tours',breaks=c(1,5,10,50,100,200))+ stat_bin(aes(y=cumsum(..count..)),binwidth = 0.2, geom="line",color="red")+ scale_y_continuous(name = 'Number of users', sec.axis = sec_axis(~./3500, name = "Cummulative percentage of routes [%]"))Resulting in this:.Here the cumsum of the count is considered. What I want is the cumsum of the count * value of the bin. Then it should be normalized, so that it can be displayed in one plot. What I am trying to to is something like that:I would appreciate any input! ThanksEdit:As test data, this should work:userID <- c(1:100)Num_Tours <- sample(1:100,100)userStats <- data.frame(userID,Num_Tours)userStats$cumulative <- cumsum(userStats$Num_Tours/sum(userStats$Num_Tours)) 解决方案 Here is an illustrative example that could be helpful for you.set.seed(111)userID <- c(1:100)Num_Tours <- sample(1:100, 100, replace=T)userStats <- data.frame(userID, Num_Tours)# Sorting x datauserStats$Num_Tours <- sort(userStats$Num_Tours)userStats$cumulative <- cumsum(userStats$Num_Tours/sum(userStats$Num_Tours))library(ggplot2)# Fix manually the maximum value of y-axisymax <- 40ggplot(data=userStats,aes(x=Num_Tours)) + geom_histogram(binwidth = 0.2, col="white")+ scale_x_log10(name = 'Number of planned tours',breaks=c(1,5,10,50,100,200))+ geom_line(aes(x=Num_Tours,y=cumulative*ymax), col="red", lwd=1)+ scale_y_continuous(name = 'Number of users', sec.axis = sec_axis(~./ymax, name = "Cumulative percentage of routes [%]")) 这篇关于R / ggplot直方图中的累积和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持! 上岸,阿里云!
09-05 21:05
查看更多