本文介绍了R中基于组的线性回归模型的建立和预测的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试基于子集(组)构建多个模型并生成其拟合度.换句话说,考虑到以下尝试,我正在尝试建立针对特定国家/地区的模型.不幸的是,在我的尝试中,我只能考虑整个数据集以构建模型,而不能将其限制为数据集中的国家/地区组.您能帮我解决这个问题吗?

I'm trying to build several models based on subsets (groups) and generate their fits. In other words, taking my attempts below into consideration, I'm trying to build models that are country specific. Unfortunately in my attempts I'm only able to take the entire dataset into consideration to build the models instead of restricting it to the groups of countries in the datasets. Could you please help me resolve this problem?

在第一种情况下,我正在进行某种交叉验证以生成预测.在第二种情况下,我不是.我的两次尝试似乎都失败了.

In the first case I'm doing some sort of cross validation to generate the predictions. In the second case I'm not. Both my attempts seem to fail.



library(modelr)
install.packages("gapminder")
library(gapminder)
data(gapminder)

#CASE 1
model1 <- lm(lifeExp ~ pop, data = gapminder)
model2 <- lm(lifeExp ~ pop + gdpPercap, data = gapminder)

models <- list(fit_model1 = model1,fit_model2 = model2)

gapminder %>% group_by(continent, country) %>%
  bind_cols(
    map(1:nrow(gapminder), function(i) {
      map_dfc(models, function(model) {
        training <- gapminder[-i, ]
        fit <- lm(model, data = training)

        validation <- gapminder[i, ]
        predict(fit, newdata = validation)

      })
    }) %>%
      bind_rows()
  )


#CASE 2
model1 <- lm(lifeExp ~ pop, data = gapminder)
model2 <- lm(lifeExp ~ pop + gdpPercap, data = gapminder)

models <- list(fit_model1 = model1,fit_model2 = model2)


for (m in names(models)) {
  gapminder[[m]] <- predict(models[[m]], gapminder %>% group_by(continent, country) )

}

推荐答案

按组建模的整洁方法是使用:

The tidyverse solution to modeling by group is to use:

  • tidyr :: nest()将变量分组
  • dplyr :: mutate() purrr :: map()一起按组创建模型
  • broom :: tidy() broom :: augment()生成模型摘要和预测
  • tidyr :: unnest() dplyr :: filter()可以按组获取摘要和预测
  • tidyr::nest() to group the variables
  • dplyr::mutate() together with purrr::map() to create models by group
  • broom::tidy() or broom::augment() to generate model summaries and predictions
  • tidyr::unnest() and dplyr::filter() to get summaries and predictions by group

这是一个例子.它与您的问题中的代码功能不同,但是我认为仍然会有所帮助.

Here's an example. It doesn't do the same as the code in your question, but I think it will be helpful nevertheless.

此代码按国家/地区以及每个模型的拟合(预测)值生成线性模型 lifeExp〜pop .

This code generates the linear model lifeExp ~ pop by country and the fitted (predicted) values for each model.

library(tidyverse)
library(broom)
library(gapminder)

gapminder_lm <- gapminder %>%
  nest(data = c(year, lifeExp, pop, gdpPercap)) %>%
  mutate(model = map(data, ~lm(lifeExp ~ pop, .)),
         fitted = map(model, augment)) %>%
  unnest(fitted)

gapminder_lm

# A tibble: 1,704 x 12
   country     continent data              model  lifeExp      pop .fitted .resid .std.resid   .hat .sigma  .cooksd
   <fct>       <fct>     <list>            <list>   <dbl>    <int>   <dbl>  <dbl>      <dbl>  <dbl>  <dbl>    <dbl>
 1 Afghanistan Asia      <tibble [12 x 4]> <lm>      28.8  8425333    33.2 -4.41     -1.54   0.182    2.92 0.262
 2 Afghanistan Asia      <tibble [12 x 4]> <lm>      30.3  9240934    33.7 -3.35     -1.15   0.161    3.11 0.128
 3 Afghanistan Asia      <tibble [12 x 4]> <lm>      32.0 10267083    34.3 -2.27     -0.773  0.139    3.24 0.0482
 4 Afghanistan Asia      <tibble [12 x 4]> <lm>      34.0 11537966    35.0 -0.985    -0.331  0.116    3.32 0.00720
 5 Afghanistan Asia      <tibble [12 x 4]> <lm>      36.1 13079460    35.9  0.193     0.0641 0.0969   3.34 0.000220
 6 Afghanistan Asia      <tibble [12 x 4]> <lm>      38.4 14880372    36.9  1.50      0.496  0.0849   3.30 0.0114
 7 Afghanistan Asia      <tibble [12 x 4]> <lm>      39.9 12881816    35.8  4.07      1.35   0.0989   3.02 0.101
 8 Afghanistan Asia      <tibble [12 x 4]> <lm>      40.8 13867957    36.4  4.47      1.48   0.0902   2.95 0.108
 9 Afghanistan Asia      <tibble [12 x 4]> <lm>      41.7 16317921    37.8  3.91      1.29   0.0838   3.05 0.0759
10 Afghanistan Asia      <tibble [12 x 4]> <lm>      41.8 22227415    41.2  0.588     0.202  0.157    3.33 0.00380
# ... with 1,694 more rows

这具有将所有内容保持在整洁的数据帧中的优势,该数据帧可以针对感兴趣的数据进行过滤.

This has the advantage of keeping everything in a tidy data frame, which can be filtered for the data of interest.

例如,过滤埃及并绘制实际值和预测值:

For example, filter for Egypt and plot real versus predicted values:

gapminder_lm %>%
  filter(country == "Egypt") %>%
  ggplot(aes(lifeExp, .fitted)) +
  geom_point() +
  labs(title = "Egypt")

这篇关于R中基于组的线性回归模型的建立和预测的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-06 07:00