为50GB文件选择正确的方法

本文介绍了为50GB文件选择正确的方法的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

我要提取的数据将导致该文件包含约1.5B行。

I'm pulling data that will result in a file that has about 1.5B rows.

我正在考虑应该使用哪种系统来分析数据。数据具有非常简单的结构。

I'm thinking about what system I should use to analyze the data. The data has a pretty simple structure.

我正在考虑使用分布式系统（例如Spark）或R系统（例如DataTables）来处理数据。

I'm considering working with the data with either a distributed system like Spark or an R system like DataTables.

假设我在具有大量内存的计算机上，是否可以使用R中带有DataTables的50GB / 1.5B行数据集？

Assuming I'm on a machine with lots of memory, can I work with a 50GB / 1.5B row dataset with DataTables in R?