中提取多条趋势线的斜率

中提取多条趋势线的斜率

本文介绍了从 geom_smooth() 中提取多条趋势线的斜率的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 ggplot 在时间序列中绘制多条趋势线(每十年).

I am trying to plot multiple trend lines (every ten years) in a time series using ggplot.

这是数据:

dat <- structure(list(YY = 1961:2010, a = c(98L, 76L, 83L, 89L, 120L,
107L, 83L, 83L, 92L, 104L, 98L, 91L, 81L, 69L, 86L, 76L, 85L,
86L, 70L, 81L, 77L, 89L, 60L, 80L, 94L, 66L, 77L, 85L, 77L, 80L,
79L, 79L, 65L, 70L, 80L, 87L, 84L, 67L, 106L, 129L, 95L, 79L,
67L, 105L, 118L, 85L, 86L, 103L, 97L, 106L)), .Names = c("YY",
"a"), row.names = c(NA, -50L), class = "data.frame")

这是脚本:

p <- ggplot(dat, aes(x = YY))
p <- p + geom_line(aes(y=a),colour="blue",lwd=1)
p <- p + geom_point(aes(y=a),colour="blue",size=2)

p <- p + theme(panel.background=element_rect(fill="white"),
         plot.margin = unit(c(0.5,0.5,0.5,0.5),"cm"),
         panel.border=element_rect(colour="black",fill=NA,size=1),
         axis.line.x=element_line(colour="black"),
         axis.line.y=element_line(colour="black"),
         axis.text=element_text(size=15,colour="black",family="serif"),
         axis.title=element_text(size=15,colour="black",family="serif"),
         legend.position = "top")

p <- p + scale_x_discrete(limits = c(seq(1961,2010,5)),expand=c(0,0))

p <- p + geom_smooth(data=dat[1:10,],aes(x=YY,y=a),method="lm",se=FALSE,color="black",formula=y~x,linetype="dashed")

p <- p + geom_smooth(data=dat[11:20,],aes(x=YY,y=a),method="lm",se=FALSE,color="black",formula=y~x,linetype="dashed")

p <- p + geom_smooth(data=dat[21:30,],aes(x=YY,y=a),method="lm",se=FALSE,color="black",formula=y~x,linetype="dashed")

p <- p + geom_smooth(data=dat[31:40,],aes(x=YY,y=a),method="lm",se=FALSE,color="black",formula=y~x,linetype="dashed")

p <- p + geom_smooth(data=dat[41:50,],aes(x=YY,y=a),method="lm",se=FALSE,color="black",formula=y~x,linetype="dashed")

p <- p + labs(x="Year",y="Number of Days")
outImg <- paste0("test",".png")
ggsave(outImg,p,width=8,height=5)

这是生成的图像:

我想要什么/问题

  1. 我想提取斜率并将它们添加到图中的趋势线上.如何从 geom_smooth() 中提取每条线的斜率?

  1. I want to extract the slope and add them on the the trend lines in the figure. How can I extract the slope of each line from the geom_smooth()?

目前,我正在一张一张地绘制趋势线.我想知道是否有一种有效的方法可以调整时间窗口.例如,假设我想绘制每 5 年的趋势线.上图中时间窗口为10.

Currently, I am plotting the trend lines one by one. I want to know if there is an efficient way of doing this with adjustable time window. Suppose for example, I want to plot the trend lines for every 5 years. In the figure above the time window is 10.

假设,我只想绘制显着的趋势线(即 p 值

Suppose, I only want to plot the significant trend lines (i.e., p-value < 0.05, null: no trend or slope equals 0), is it possible to implement this with geom_smooth()?

我将不胜感激.

推荐答案

因此,最好在将数据传输到 ggplot2 之前处理这些任务中的每一个,但使用 tidyverse 中的其他一些包,它们都变得相当容易.

So, each of these tasks are best handled before you pipe your data into ggplot2, but they are all made fairly easy using some of the other packages from the tidyverse.

从问题 1 和 2 开始:

Beginning with questions 1 and 2:

虽然 ggplot2 可以绘制回归线,但要提取估计的斜率系数,您需要明确使用 lm() 对象.使用 group_by()mutate(),您可以添加一个分组变量(我下面的代码仅针对 5 年组执行此操作),然后仅计算和提取斜率估计到现有数据框中的列中.然后可以使用 geom_text() 调用在 ggplot 中绘制这些斜率估计值.我在下面以一种快速而肮脏的方式完成了此操作(将每个标签放置在它们回归的 x 和 y 值的平均值处),但您可以指定它们在数据框中的确切位置.

While ggplot2 can plot the regression line, to extract the estimated slope coefficients you need to work with the lm() object explicitly. Using group_by() and mutate(), you can add a grouping variable (my code below does this for 5 year groups just for example) and then calculate and extract just the slope estimate into columns in your existing data frame. Then those slope estimates can be plotted in ggplot using the geom_text() call. I've done this below in a quick and dirty way (placing each label at the mean of the x and y values they regress) but you can specify their exact placement in your dataframe.

分组变量和数据准备也使问题 2 变得轻而易举:既然您在数据框中明确地有分组变量,则无需逐个绘制,geom_smooth() 接受 审美.

Grouping variables and data prep makes question 2 a breeze too: now that you have the grouping variables explicitly in your dataframe there is no need to plot one by one, geom_smooth() accepts the group aesthetic.

此外,要回答问题 3,您可以从 lm 对象的摘要中提取 pvalue,并仅过滤掉那些对您关心的级别有意义的值.如果你将这个现在完整的数据框传递给 geom_smooth()geom_text() 你会得到你正在寻找的情节!

Additionally, to answer question 3, you can extract the pvalue from the summary of your lm objects and filter out only those that are significant to the level you care about. If you pass this now complete dataframe to geom_smooth() and geom_text() you will get the plot you're looking for!

library(tidyverse)

 # set up our base plot
 p <- ggplot(dat, aes(x = YY, y = a)) +
  geom_line(colour = "blue", lwd = 1) +
  geom_point(colour = "blue", size = 2) +
  theme(
    panel.background = element_rect(fill = "white"),
    plot.margin = unit(c(0.5, 0.5, 0.5, 0.5), "cm"),
    panel.border = element_rect(colour = "black", fill = NA, size = 1),
    axis.line.x = element_line(colour = "black"),
    axis.line.y = element_line(colour = "black"),
    axis.text = element_text(size = 15, colour = "black", family = "serif"),
    axis.title = element_text(size = 15, colour = "black", family = "serif"),
    legend.position = "top"
  ) +
  scale_x_discrete(limits = c(seq(1961, 2010, 5)), expand = c(0, 0))

# add a grouping variable (or many!)
 prep5 <- dat %>%
  mutate(group5 = rep(1:10, each = 5)) %>%
  group_by(group5) %>%
  mutate(
    slope = round(lm(YY ~ a)$coefficients[2], 2),
    significance = summary(lm(YY ~ a))$coefficients[2, 4],
    x = mean(YY),   # x coordinate for slope label
    y = mean(a)     # y coordinate for slope label
  ) %>%
  filter(significance < .2)   # only keep those with a pvalue < .2

p + geom_smooth(
  data = prep5, aes(x = YY, y = a, group = group5),  # grouping variable does the plots for us!
  method = "lm", se = FALSE, color = "black",
  formula = y ~ x, linetype = "dashed"
) +
  geom_text(
    data = prep5, aes(x = x, y = y, label = slope),
    nudge_y = 12, nudge_x = -1
  )

现在您可能希望在指定文本标签的位置时比我在这里更加小心.我使用了手段和 geom_text()nudge_* 参数来做一个简单的例子,但请记住,因为这些值明确映射到 x 和 y 坐标,你已经完成控制!

Now you may want to be a little more careful about specifying the location of your text labels than I have been here. I used means and the nudge_* arguments of geom_text() to do a quick example but keep in mind since these values are mapped explicitly to x and y coordinates, you have complete control!

reprex 创建于 2018-07-16包(v0.2.0).

Created on 2018-07-16 by the reprexpackage (v0.2.0).

这篇关于从 geom_smooth() 中提取多条趋势线的斜率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-05 21:54