问题描述
感谢 joran 帮助我对 上一个问题 我想让 R 中的数据框变小,以便我可以对数据进行时间序列分析数据.
Thanks to joran for helping me to group data in my previous question where I wanted to make a data frame in R smaller so that I can do time-series analysis on the data.
现在我想从数据框中进一步提取数据.数据框由 6 列组成.第 1 到第 5 列都有离散的名称/值,例如地区、性别、年、月和年龄组.第六列是该特定组合的死亡人数.摘录如下所示:
Now I would like to actually further extract data from the dataframe. The dataframe is made up of 6 columns. Columns 1 to 5 each have discrete names/values, such as a district, gender, year, month and age group. The sixth column is the number of death counts for that specific combination. An extract looks like this:
District Gender Year Month AgeGroup TotalDeaths
Northern Male 2006 11 01-4 0
Northern Male 2006 11 05-14 1
Northern Male 2006 11 15+ 83
Northern Male 2006 12 0 3
Northern Male 2006 12 01-4 0
Northern Male 2006 12 05-14 0
Northern Male 2006 12 15+ 106
Southern Female 2003 1 0 6
Southern Female 2003 1 01-4 0
Southern Female 2003 1 05-14 3
Southern Female 2003 1 15+ 136
Southern Female 2003 2 0 6
Southern Female 2003 2 01-4 0
Southern Female 2003 2 05-14 1
Southern Female 2003 2 15+ 111
Southern Female 2003 3 0 2
Southern Female 2003 3 01-4 0
Southern Female 2003 3 05-14 1
Southern Female 2003 3 15+ 141
Southern Female 2003 4 0 4
我是时间序列的新手,我认为我需要这样做来分析数据:我需要提取较小的时间序列"数据对象,这些对象是唯一的纵向数据.例如,从上面的数据框中,我想为每个地区、性别和年龄组提取更小的数据对象:
I am new to time-series, and I think I will need to do this to analyse the data: I will need to extract smaller 'time-series' data objects that are unique and longitudinal data. For example from this above dataframe, I want to extract smaller data objects like this for each District, Gender and AgeGroup:
District Gender Year Month AgeGroup TotalDeaths
Northern Male 2003 1 01-4 0
Northern Male 2003 2 01-4 1
Northern Male 2003 3 01-4 0
Northern Male 2003 4 01-4 3
Northern Male 2003 5 01-4 4
Northern Male 2003 6 01-4 6
Northern Male 2003 7 01-4 5
Northern Male 2003 8 01-4 0
Northern Male 2003 9 01-4 1
Northern Male 2003 10 01-4 2
Northern Male 2003 11 01-4 0
Northern Male 2003 12 01-4 1
Northern Male 2004 1 01-4 1
Northern Male 2004 2 01-4 0
要去
Northern Male 2006 11 01-4 0
Northern Male 2006 12 01-4 0
我在 Excel 中尝试了一些东西,使用这些数据创建数据透视表,然后尝试提取信息字符串 - 但失败了.之后我在 R 中发现了 reshape
,但我要么不知道代码,要么可能不应该使用 reshape
来做到这一点.
I tried something in Excel, creating pivot tables with this data, and then tried to extract the string of information - but failed. After that I discovered reshape
in R, but I either don't know the codes or perhaps should not use reshape
to do this.
我什至不确定这是否是分析此横截面时间序列数据的正确方法,即.如果实际上需要另一种格式来使用 read.ts()
、ts()
和 arima()
等函数分析这些数据.
I am not even certain if this is the correct/ way to analyse this cross-sectional time-series data, ie. if there is actually another format required to analyse this data with functions such as read.ts()
, ts()
and arima()
.
我的最终目标是使用这些数据和 amelia2
包及其函数来估算 2007 年和 2008 年某些月份的 TotalDeaths
缺失,其中数据是课程缺失.
My eventual aim is to use this data and the amelia2
package with its functions to impute for missing TotalDeaths
for certain months in 2007 and 2008, where the data is of course missing.
不胜感激任何帮助、如何做到这一点以及如何解决这个问题的建议.
Any help, how to do this and perhaps suggestions on how to tackle this problem would be gratefully appreciated.
推荐答案
对于如何最好地提取的狭隘问题:
For the narrow question of how to best extract:
subset(dfrm, subset=(District=="Northern" & Gender=="Male" & AgeGroup=="01-4"))
subset
也有一个选择参数来缩小列.我怀疑对您使用的提取"一词进行搜索只会提取 ?Extract 页面的点击量,该页面令人惊讶地没有指向 subset
的链接.(我从早期版本的 AgeGroup 规范中修剪了一个尾随空格.)
subset
also has a select argument to narrow down the columns. I suspect a search on the term "extract" you were using would have only pulled up hits for the ?Extract page which surprisingly has no link to subset
. (I trimmed a trailing space from an earlier version of the AgeGroup specification.)
这篇关于如何从 R 中的数据帧中提取纵向时间序列数据以进行时间序列分析和插补的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!