本文介绍了在R中了解日期并绘制ggplot2的直方图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述 29岁程序员,3月因学历无情被辞! 主要问题 我有理解理解为什么日期,标签和休息时间的处理不能像我在R中所想的那样工作一个直方图与ggplot2。 我正在寻找: 我的日期频率的直方图 以匹配条为中心的刻度标记 %Yb 格式 适当的限制;最小化网格空间边缘与最外边栏之间的空白空间 我有在learnr.wordpress,一个流行的R博客。它表示我需要将我的数据写入POSIXct格式,我现在认为是错误的,浪费了我的时间。 另一个学习者帖子在ggplot2中重新创建一个时间序列,但并不适用于我的情况。 r-博客有一个帖子这个,但它似乎过时了。简单的 format = 选项对我来说无效。 这个SO问题正在玩休息和标签。我试着把我的 Date 矢量作为连续的,不要以为它的效果很好。看起来它覆盖着相同的标签文字,所以这些字母看起来很奇怪。分配是正确的,但有奇数的休息。我根据接受的答案进行的尝试是这样的(结果这里)。 解决方案 更新 版本2:使用日期类 我更新了该示例,以演示对齐标签并在图上设置限制。我也证明, as.Date 确实可以一直使用(实际上它可能比我早期的例子更适合你的数据)。 Target Plot v2 代码v2 这里是(有些过分)评论代码: 库(ggplot2)库(scales) 日期< - read.csv(http://pastebin.com/raw.php?i=sDzXKFxJ,sep =,,header = T) dates $ Date< - as.Date(dates $ Date) #将日期转换为数值等价物#注意日期是内部存储的天数,#因此很容易精神上来回转换日期$ num bin< - 60#用于聚合数据并对齐标签 p< - ggplot(dates,aes(num,..count ..))p< - p + geom_histogram(binwidth = bin,color =white) #数字数据被视为一个日期,#break被设置为等于binwidth的间隔,#和一组标签是生成和调整以便与条对齐p< - p + scale_x_date(breaks = seq(min(dates $ num)-20,#change -20 term to taste max(dates $ num) , bin), labels = date_format(%Y-%b), limits = c(as.Date(2009-01-01), as.Date(2011-12-01)) #从这里,格式放松p< - p + theme_bw()+ xlab(NULL)+ opts .text.x = theme_text(angle = 45, hjust = 1, vjust = 1))p 版本1:使用POSIXct 我尝试一个解决方案,在 ggplot2 ,无需汇总,并在2009年初至2011年底之间在x轴上设置限制。 Target Plot v1 代码v1 library(ggplot2 )库(scales) 日期< - read.csv(http://pastebin.com/raw.php?i=sDzXKFxJ,sep =, ,header = T)日期$ Date< - as.POSIXct(dates $ Date) p< - ggplot(dates,aes(Date,..count ..)) + geom_histogram()+ theme_bw()+ xlab(NULL)+ scale_x_datetime(breaks = date_breaks(3 months), labels = date_format(%Y- %b), limits = c(as.POSIXct(2009-01-01), as.POSIXct(2011-12-01)) p 当然,它可以在轴上使用标签选项,但这是在绘图包中用一个干净的简短例程来绘制绘图。 Main QuestionI'm having issues with understanding why the handling of dates, labels and breaks is not working as I would have expected in R when trying to make a histogram with ggplot2.I'm looking for:A histogram of the frequency of my datesTick marks centered under the matching barsDate labels in %Y-b formatAppropriate limits; minimized empty space between edge of grid space and outermost barsI've uploaded my data to pastebin to make this reproducible. I've created several columns as I wasn't sure the best way to do this:> dates <- read.csv("http://pastebin.com/raw.php?i=sDzXKFxJ", sep=",", header=T)> head(dates) YM Date Year Month1 2008-Apr 2008-04-01 2008 42 2009-Apr 2009-04-01 2009 43 2009-Apr 2009-04-01 2009 44 2009-Apr 2009-04-01 2009 45 2009-Apr 2009-04-01 2009 46 2009-Apr 2009-04-01 2009 4Here's what I tried:library(ggplot2)library(scales)dates$converted <- as.Date(dates$Date, format="%Y-%m-%d")ggplot(dates, aes(x=converted)) + geom_histogram()+ opts(axis.text.x = theme_text(angle=90))Which yields this graph. I wanted %Y-%b formatting, though, so I hunted around and tried the following, based on this SO:ggplot(dates, aes(x=converted)) + geom_histogram()+ scale_x_date(labels=date_format("%Y-%b"),+ breaks = "1 month")+ opts(axis.text.x = theme_text(angle=90))stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.That gives me this graphCorrect x axis label formatThe frequency distribution has changed shape (binwidth issue?)Tick marks don't appear centered under barsThe xlims have changed as wellI worked through the example in the ggplot2 documentation at the scale_x_date section and geom_line() appears to break, label, and center ticks correctly when I use it with my same x-axis data. I don't understand why the histogram is different.Updates based on answers from edgester and gaudenI initially thought gauden's answer helped me solve my problem, but am now puzzled after looking more closely. Note the differences between the two answers' resulting graphs after the code.Assume for both:library(ggplot2)library(scales)dates <- read.csv("http://pastebin.com/raw.php?i=sDzXKFxJ", sep=",", header=T)Based on @edgester's answer below, I was able to do the following:freqs <- aggregate(dates$Date, by=list(dates$Date), FUN=length)freqs$names <- as.Date(freqs$Group.1, format="%Y-%m-%d")ggplot(freqs, aes(x=names, y=x)) + geom_bar(stat="identity") + scale_x_date(breaks="1 month", labels=date_format("%Y-%b"), limits=c(as.Date("2008-04-30"),as.Date("2012-04-01"))) + ylab("Frequency") + xlab("Year and Month") + theme_bw() + opts(axis.text.x = theme_text(angle=90))Here is my attempt based on gauden's answer:dates$Date <- as.Date(dates$Date)ggplot(dates, aes(x=Date)) + geom_histogram(binwidth=30, colour="white") + scale_x_date(labels = date_format("%Y-%b"), breaks = seq(min(dates$Date)-5, max(dates$Date)+5, 30), limits = c(as.Date("2008-05-01"), as.Date("2012-04-01"))) + ylab("Frequency") + xlab("Year and Month") + theme_bw() + opts(axis.text.x = theme_text(angle=90))Plot based on edgester's approach:Plot based on gauden's approach:Note the following:gaps in gauden's plot for 2009-Dec and 2010-Mar; table(dates$Date) reveals that there are 19 instances of 2009-12-01 and 26 instances of 2010-03-01 in the dataedgester's plot starts at 2008-Apr and ends at 2012-May. This is correct based on a minimum value in the data of 2008-04-01 and a max date of 2012-05-01. For some reason gauden's plot starts in 2008-Mar and still somehow manages to end at 2012-May. After counting bins and reading along the month labels, for the life of me I can't figure out which plot has an extra or is missing a bin of the histogram!Any thoughts on the differences here? edgester's method of creating a separate countRelated ReferencesAs an aside, here are other locations that have information about dates and ggplot2 for passers-by looking for help:Started here at learnr.wordpress, a popular R blog. It stated that I needed to get my data into POSIXct format, which I now think is false and wasted my time.Another learnr post recreates a time series in ggplot2, but wasn't really applicable to my situation.r-bloggers has a post on this, but it appears outdated. The simple format= option did not work for me.This SO question is playing with breaks and labels. I tried treating my Date vector as continuous and don't think it worked so well. It looked like it was overlaying the same label text over and over so the letters looked kind of odd. The distribution is sort of correct but there are odd breaks. My attempt based on the accepted answer was like so (result here). 解决方案 UPDATEVersion 2: Using Date classI update the example to demonstrate aligning the labels and setting limits on the plot. I also demonstrate that as.Date does indeed work when used consistently (actually it is probably a better fit for your data than my earlier example).The Target Plot v2The Code v2And here is (somewhat excessively) commented code:library("ggplot2")library("scales")dates <- read.csv("http://pastebin.com/raw.php?i=sDzXKFxJ", sep=",", header=T)dates$Date <- as.Date(dates$Date)# convert the Date to its numeric equivalent# Note that Dates are stored as number of days internally,# hence it is easy to convert back and forth mentallydates$num <- as.numeric(dates$Date)bin <- 60 # used for aggregating the data and aligning the labelsp <- ggplot(dates, aes(num, ..count..))p <- p + geom_histogram(binwidth = bin, colour="white")# The numeric data is treated as a date,# breaks are set to an interval equal to the binwidth,# and a set of labels is generated and adjusted in order to align with barsp <- p + scale_x_date(breaks = seq(min(dates$num)-20, # change -20 term to taste max(dates$num), bin), labels = date_format("%Y-%b"), limits = c(as.Date("2009-01-01"), as.Date("2011-12-01")))# from here, format at easep <- p + theme_bw() + xlab(NULL) + opts(axis.text.x = theme_text(angle=45, hjust = 1, vjust = 1))pVersion 1: Using POSIXctI try a solution that does everything in ggplot2, drawing without the aggregation, and setting the limits on the x-axis between the beginning of 2009 and the end of 2011.The Target Plot v1The Code v1library("ggplot2")library("scales")dates <- read.csv("http://pastebin.com/raw.php?i=sDzXKFxJ", sep=",", header=T)dates$Date <- as.POSIXct(dates$Date)p <- ggplot(dates, aes(Date, ..count..)) + geom_histogram() + theme_bw() + xlab(NULL) + scale_x_datetime(breaks = date_breaks("3 months"), labels = date_format("%Y-%b"), limits = c(as.POSIXct("2009-01-01"), as.POSIXct("2011-12-01")) )pOf course, it could do with playing with the label options on the axis, but this is to round off the plotting with a clean short routine in the plotting package. 这篇关于在R中了解日期并绘制ggplot2的直方图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持! 上岸,阿里云!
09-05 20:43