R / ggplot直方图中的累积和

本文介绍了R / ggplot直方图中的累积和的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述 29岁程序员，3月因学历无情被辞！我有一个包含用户ID和他们创建的对象数量的数据集。我使用ggplot绘制了直方图，现在我试图将x值的累积和作为一行。目标是看到很多垃圾箱对总数的贡献。我尝试了以下方法： ggplot（data = userStats，aes（x = Num_Tours））+ geom_histogram（binwidth = 0.2）+ scale_x_log10（name ='计划行程数'，休息= c（1,5,10,50,100,200））+ geom_line（aes（x = Num_Tours，y = cumsum（Num_Tours）/ sum Num_Tours）* 3500），color =red）+ scale_y_continuous（name ='Number of users'，sec.axis = sec_axis（〜。/ 3500，name =累计路线百分比[％]））这是行不通的，因为我没有包含任何垃圾箱，所以剧情和 ggplot（data = userStats，aes（x = Num_Tours））+ geom_histogram （binwidth = 0.2）+ scale_x_log10（name ='计划行程数'，break = c（1,5,10,50,100,200））+ stat_bin（aes（y = cumsum（.. count ..）），binwidth = 0.2，geom =line，color =red）+ scale_y_continuous（name ='Number of users'，sec.axis = sec_axis（〜。/ 3500，name =）累积百分比的路线[％]））导致：。这里考虑计数的cumsum。我想要的是bin的count *值的cumsum。然后它应该正常化，以便它可以显示在一个图中。我想要的是这样的：如果有任何输入，我将不胜感激！感谢编辑：作为测试数据，这应该是正常的： userID Num_Tours userStats< - data.frame（userID，Num_Tours ） userStats $ cumulative< - cumsum（userStats $ Num_Tours / sum（userStats $ Num_Tours））解决方案这是一个说明性的例子，可以帮助您。 set .seed（111） userID Num_Tours userStats< - data.frame （用户ID，Num_Tours）＃排序x数据 userStats $ Num_Tours< - sort（userStats $ Num_Tours） userStats $ cumulative< - cumsum（userStats $ Num_Tours / sum （userStats $ Num_Tours）） library（ggplot2）＃手动修复y轴的最大值 ymax ggplot（data = userStats ，aes（x = Num_Tours））+ geom_histogram（binwidth = 0.2，col =white）+ scale_x_log10（nam e ='计划行程数'，中断= c（1,5,10,50,100,200））+ geom_line（aes（x = Num_Tours，y =累积* ymax），col =红色，lwd = 1）+ scale_y_continuous（name ='Number of users'，sec.axis = sec_axis（〜。/ ymax， name =累计路线百分比[％]）） I have a dataset with user IDs and the number of objects they created. I drew the histogram using ggplot and now I'm trying to include the cumulative sum of the x-values as a line. The aim is to see much the bins contribute to the total number. I tried the following:ggplot(data=userStats,aes(x=Num_Tours)) + geom_histogram(binwidth = 0.2)+ scale_x_log10(name = 'Number of planned tours',breaks=c(1,5,10,50,100,200))+ geom_line(aes(x=Num_Tours, y=cumsum(Num_Tours)/sum(Num_Tours)*3500),color="red")+ scale_y_continuous(name = 'Number of users', sec.axis = sec_axis(~./3500, name = "Cummulative percentage of routes [%]"))This does not work because I don't include any bins so the plotand ggplot(data=userStats,aes(x=Num_Tours)) + geom_histogram(binwidth = 0.2)+ scale_x_log10(name = 'Number of planned tours',breaks=c(1,5,10,50,100,200))+ stat_bin(aes(y=cumsum(..count..)),binwidth = 0.2, geom="line",color="red")+ scale_y_continuous(name = 'Number of users', sec.axis = sec_axis(~./3500, name = "Cummulative percentage of routes [%]"))Resulting in this:.Here the cumsum of the count is considered. What I want is the cumsum of the count * value of the bin. Then it should be normalized, so that it can be displayed in one plot. What I am trying to to is something like that:I would appreciate any input! ThanksEdit:As test data, this should work:userID <- c(1:100)Num_Tours <- sample(1:100,100)userStats <- data.frame(userID,Num_Tours)userStats$cumulative <- cumsum(userStats$Num_Tours/sum(userStats$Num_Tours)) 解决方案 Here is an illustrative example that could be helpful for you.set.seed(111)userID <- c(1:100)Num_Tours <- sample(1:100, 100, replace=T)userStats <- data.frame(userID, Num_Tours)# Sorting x datauserStats$Num_Tours <- sort(userStats$Num_Tours)userStats$cumulative <- cumsum(userStats$Num_Tours/sum(userStats$Num_Tours))library(ggplot2)# Fix manually the maximum value of y-axisymax <- 40ggplot(data=userStats,aes(x=Num_Tours)) + geom_histogram(binwidth = 0.2, col="white")+ scale_x_log10(name = 'Number of planned tours',breaks=c(1,5,10,50,100,200))+ geom_line(aes(x=Num_Tours,y=cumulative*ymax), col="red", lwd=1)+ scale_y_continuous(name = 'Number of users', sec.axis = sec_axis(~./ymax, name = "Cumulative percentage of routes [%]")) 这篇关于R / ggplot直方图中的累积和的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！上岸，阿里云！