问题描述
我在一个目录中有大量的csv文件.这些文件包含两列,Date
和Price
. filename.csv
的filename
包含数据系列的唯一标识符.我知道,当这些时间序列数据是动物园对象时,可以处理合并数据序列的缺失值.我还了解到,使用na.locf(merge() function
时,我可以使用最新的观测值来填充缺失的值.
I have a large set of csv files in a single directory. These files contain two columns, Date
and Price
. The filename
of filename.csv
contains the unique identifier of the data series. I understand that missing values for merged data series can be handled when these times series data are zoo objects. I also understand that, in using the na.locf(merge() function
, I can fill in the missing values with the most recent observations.
我想自动化该过程.
- 将
*.csv
文件的日期和价格列数据加载到R数据框中. - 在合并的动物园时间序列组合"对象中建立每个标识相同的时间序列.
- 使用
MergedData <- na.locf(merge( ))
合并这些动物园对象的时间序列.
- loading the
*.csv
file columnar Date and Price data into R dataframes. - establishing each distinct time series within the Merged zoo "portfolio of time series" objects with an identity that is equal to each of their s.
- merging these zoo objects time series using
MergedData <- na.locf(merge( ))
.
当然,最终目标是使用fPortfolio
软件包.
The ultimate goal, of course, is to use the fPortfolio
package.
我已经使用以下语句创建了Date,Price
对的数据帧.这种方法的问题是我从文件中丢失了时间序列数据的<filename>
标识符.
I've used the following statement to create a data frame of Date,Price
pairs. The problem with this approach is that I lose the <filename>
identifier of the time series data from the files.
result <- lapply(files, function(x) x <- read.csv(x) )
我了解我可以编写代码来生成实例逐个执行所有这些步骤所需的R语句.我想知道是否有某种方法不需要我这样做.我很难相信别人不想执行同样的任务.
I understand that I can write code to generate the R statements required to do all these steps instance by instance. I'm wondering if there is some approach that wouldn't require me to do that. It's hard for me to believe that others haven't wanted to perform this same task.
推荐答案
使用sapply
(保留文件名)可以得到更好的格式.在这里,我将保留lapply
.
You can have better formatting using sapply
( keep the files names). Here I will keep lapply
.
- 假设所有文件都在同一目录中,则可以使用
list.files
. 这样的工作流程非常方便. - 我会使用
read.zoo
直接获取动物园对象(避免以后强制使用)
- Assuming that all your files are in the same directory you can use
list.files
. it is very handy for such workflow. - I would use
read.zoo
to get directly zoo objects(avoid later coercing)
例如:
zoo.objs <- lapply(list.files(path=MY_FILES_DIRECTORY,
pattern='^zoo_*.csv', ## I look for csv files,
## which names start with zoo_
full.names=T), ## to get full names path+filename
read.zoo)
我现在再次使用list.files
重命名结果
I use now list.files
again to rename my result
names(zoo.objs) <- list.files(path=MY_FILES_DIRECTORY,
pattern='^zoo_*.csv')
这篇关于使用R从CSV文件创建和合并动物园对象时间序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!