中的数据帧中提取纵向时间序列数据以进行时间序列分析和插补

中的数据帧中提取纵向时间序列数据以进行时间序列分析和插补

本文介绍了如何从 R 中的数据帧中提取纵向时间序列数据以进行时间序列分析和插补的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

感谢 joran 帮助我对 上一个问题 我想让 R 中的数据框变小,以便我可以对数据进行时间序列分析数据.

Thanks to joran for helping me to group data in my previous question where I wanted to make a data frame in R smaller so that I can do time-series analysis on the data.

现在我想从数据框中进一步提取数据.数据框由 6 列组成.第 1 到第 5 列都有离散的名称/值,例如地区、性别、年、月和年龄组.第六列是该特定组合的死亡人数.摘录如下所示:

Now I would like to actually further extract data from the dataframe. The dataframe is made up of 6 columns. Columns 1 to 5 each have discrete names/values, such as a district, gender, year, month and age group. The sixth column is the number of death counts for that specific combination. An extract looks like this:

             District  Gender Year Month    AgeGroup TotalDeaths
             Northern    Male 2006    11        01-4           0
             Northern    Male 2006    11       05-14           1
             Northern    Male 2006    11         15+          83
             Northern    Male 2006    12           0           3
             Northern    Male 2006    12        01-4           0
             Northern    Male 2006    12       05-14           0
             Northern    Male 2006    12         15+         106
             Southern  Female 2003     1           0           6
             Southern  Female 2003     1        01-4           0
             Southern  Female 2003     1       05-14           3
             Southern  Female 2003     1         15+         136
             Southern  Female 2003     2           0           6
             Southern  Female 2003     2        01-4           0
             Southern  Female 2003     2       05-14           1
             Southern  Female 2003     2         15+         111
             Southern  Female 2003     3           0           2
             Southern  Female 2003     3        01-4           0
             Southern  Female 2003     3       05-14           1
             Southern  Female 2003     3         15+         141
             Southern  Female 2003     4           0           4

我是时间序列的新手,我认为我需要这样做来分析数据:我需要提取较小的时间序列"数据对象,这些对象是唯一的纵向数据.例如,从上面的数据框中,我想为每个地区、性别和年龄组提取更小的数据对象:

I am new to time-series, and I think I will need to do this to analyse the data: I will need to extract smaller 'time-series' data objects that are unique and longitudinal data. For example from this above dataframe, I want to extract smaller data objects like this for each District, Gender and AgeGroup:

             District  Gender Year Month    AgeGroup TotalDeaths
             Northern    Male 2003     1        01-4           0
             Northern    Male 2003     2        01-4           1
             Northern    Male 2003     3        01-4           0
             Northern    Male 2003     4        01-4           3
             Northern    Male 2003     5        01-4           4
             Northern    Male 2003     6        01-4           6
             Northern    Male 2003     7        01-4           5
             Northern    Male 2003     8        01-4           0
             Northern    Male 2003     9        01-4           1
             Northern    Male 2003    10        01-4           2
             Northern    Male 2003    11        01-4           0
             Northern    Male 2003    12        01-4           1
             Northern    Male 2004     1        01-4           1
             Northern    Male 2004     2        01-4           0

要去

             Northern    Male 2006    11        01-4           0
             Northern    Male 2006    12        01-4           0

我在 Excel 中尝试了一些东西,使用这些数据创建数据透视表,然后尝试提取信息字符串 - 但失败了.之后我在 R 中发现了 reshape,但我要么不知道代码,要么可能不应该使用 reshape 来做到这一点.

I tried something in Excel, creating pivot tables with this data, and then tried to extract the string of information - but failed. After that I discovered reshapein R, but I either don't know the codes or perhaps should not use reshape to do this.

我什至不确定这是否是分析此横截面时间序列数据的正确方法,即.如果实际上需要另一种格式来使用 read.ts()ts()arima() 等函数分析这些数据.

I am not even certain if this is the correct/ way to analyse this cross-sectional time-series data, ie. if there is actually another format required to analyse this data with functions such as read.ts(), ts() and arima().

我的最终目标是使用这些数据和 amelia2 包及其函数来估算 2007 年和 2008 年某些月份的 TotalDeaths 缺失,其中数据是课程缺失.

My eventual aim is to use this data and the amelia2 package with its functions to impute for missing TotalDeaths for certain months in 2007 and 2008, where the data is of course missing.

不胜感激任何帮助、如何做到这一点以及如何解决这个问题的建议.

Any help, how to do this and perhaps suggestions on how to tackle this problem would be gratefully appreciated.

推荐答案

对于如何最好地提取的狭隘问题:

For the narrow question of how to best extract:

subset(dfrm, subset=(District=="Northern" &  Gender=="Male" &  AgeGroup=="01-4"))

subset 也有一个选择参数来缩小列.我怀疑对您使用的提取"一词进行搜索只会提取 ?Extract 页面的点击量,该页面令人惊讶地没有指向 subset 的链接.(我从早期版本的 AgeGroup 规范中修剪了一个尾随空格.)

subset also has a select argument to narrow down the columns. I suspect a search on the term "extract" you were using would have only pulled up hits for the ?Extract page which surprisingly has no link to subset. (I trimmed a trailing space from an earlier version of the AgeGroup specification.)

这篇关于如何从 R 中的数据帧中提取纵向时间序列数据以进行时间序列分析和插补的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-29 05:26