问题描述
我正在创建自己的R包,我想知道我可以用什么方法来添加(时间序列)数据集到我的包中。以下是具体细节:我已经创建了一个名为数据的包子目录,我知道这是我应该保存数据集的位置我想添加到我的包中。我也认识到包含数据的文件可能是这样的:rda , .txt 或 .csv 文件。
我要添加到包中的每一系列数据都包含一列数字(例如,形式为340或4.5),每个数据系列不同
到目前为止,我已将所有数据集保存到一个 .txt 文件中。我也使用 data()函数成功加载数据。但是问题没有解决。
问题是每个系列的数据都作为一个因素加载,除了长度最大的系列之外。作为因素加载的系列包含缺少的值(。形式)。我不得不添加这些缺少的值,以使每列数据的长度相同。我尝试将数据保存为不等的列,但是在调用 data()之后收到错误消息。
将缺省值添加到获取数据加载是一旦数据加载,我需要删除NA,以便我的分析数据!所以,这显然不是一个很好的办法。
理想情况下(我想),我希望将数据作为数字向量或列表加载。这样,我不需要在每个系列的末尾附加NA。
如何解决这个问题?我应该将所有数据保存到一个文件中吗?如果是这样,我应该采用什么格式?也许我应该将数据集保存到多个文件中?再一次,在哪种格式?这样做最好的实践方法是什么?任何提示都将不胜感激。
我不知道我是否正确理解您的问题。但是,如果您以最喜欢的格式编辑数据并使用
save(myediteddata,file =data.rda)保存,
数据应该按照您在R中看到的方式加载。
要加载数据目录中的所有文件,您应该添加
LazyData:true
在您的包中的DESCRIPTION文件。
如果不要帮你,你可以发布你的一个文件,并打印你想要的格式,这将有助于我们帮助你;)
I am creating my own R package and I was wondering what are the possible methods that I can use to add (time-series) datasets to my package. Here are the specifics:
I have created a package subdirectory called data and I am aware that this is the location where I should save the datasets that I want to add to my package. I am also cognizant of the fact that the files containing the data may be .rda, .txt, or .csv files.
Each series of data that I want to add to the package consists of a single column of numbers (eg. of the form 340 or 4.5) and each series of data differs in length.
So far, I have saved all of the datasets into a .txt file. I have also successfully loaded the data using the data() function. Problem not solved, however.
The problem is that each series of data loads as a factor except for the series greatest in length. The series that load as factors contain missing values (of the form '.'). I had to add these missing values in order to make each column of data the same in length. I tried saving the data as unequal columns, but I received an error message after calling data().
A consequence of adding missing values to get the data to load is that once the data is loaded, I need to remove the NA's in order to get on with my analysis of the data! So, this clearly is not a good way of doing things.
Ideally (I suppose), I would like the data to load as numeric vectors or as a list. In this way, I wouldn't need the NA's appended to the end of each series.
How do I solve this problem? Should I save all of the data into one single file? If so, in what format should I do it? Perhaps I should save the datasets into a number of files? Again, in which format? What is the best practical way of doing this? Any tips would greatly be appreciated.
I'm not sure if I understood your question correctly. But, if you edit your data in your favorite format and save with
save(myediteddata, file="data.rda")
The data should be loaded exactly the way you saw it in R.
To load all files in data directory you should add
LazyData: true
To your DESCRIPTION file, in your package.
If this don't help you could post one of your files and a print of the format you want, this will help us to help you ;)
这篇关于如何将数据集放入R包中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!