迭代地写入Pandas的HDF5商店

迭代地写入Pandas的HDF5商店

本文介绍了迭代地写入Pandas的HDF5商店的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有以下如何存储<$ c $的示例c> HDF5文件中的系列, DataFrames 面板

Pandas has the following examples for how to store Series, DataFrames and Panelsin HDF5 files:

In [1142]: store = HDFStore('store.h5')

In [1143]: index = date_range('1/1/2000', periods=8)

In [1144]: s = Series(randn(5), index=['a', 'b', 'c', 'd', 'e'])

In [1145]: df = DataFrame(randn(8, 3), index=index,
   ......:                columns=['A', 'B', 'C'])
   ......:

In [1146]: wp = Panel(randn(2, 5, 4), items=['Item1', 'Item2'],
   ......:            major_axis=date_range('1/1/2000', periods=5),
   ......:            minor_axis=['A', 'B', 'C', 'D'])
   ......:



将其保存在商店中:



Save it in a store:

In [1147]: store['s'] = s

In [1148]: store['df'] = df

In [1149]: store['wp'] = wp



检查商店中的商品:



Inspect what's in the store:

In [1150]: store
Out[1150]:
<class 'pandas.io.pytables.HDFStore'>
File path: store.h5
/df            frame        (shape->[8,3])
/s             series       (shape->[5])
/wp            wide         (shape->[2,5,4])



关闭商店:



Close the store:

In [1151]: store.close()



问题:



Questions:


  1. 在上面的代码中,实际数据是什么时候写入磁盘

假设我想将生活在 .csv 文件中的数千个大型数据帧添加到单个 .h5 文件。我需要加载它们并将它们逐个添加到 .h5 文件中,因为我不能能够将它们全部存入内存中他们会记忆太多。这可能与HDF5有关吗?这样做的正确方法是什么?

Say I want to add thousands of large dataframes living in .csv files to a single .h5 file. I would need to load them and add them to the .h5 file one by one since I cannot afford to have them all in memory at once as they would take too much memory. Is this possible with HDF5? What would be the correct way to do it?

Pandas文档说明如下:

The Pandas documentation says the following:

不可追加也不可查询是什么意思?此外,不应该说一旦关闭而不是

What does it mean by not appendable nor queryable? Also, shouldn't it say once closed instead of written?


推荐答案


  1. 一旦声明被激活,例如 store ['df'] = df 关闭只关闭实际文件(如果进程存在,将关闭它,但会打印一条警告消息)

  1. As soon as the statement is exectued, eg store['df'] = df. The close just closes the actual file (which will be closed for you if the process exists, but will print a warning message)

阅读

一般不是最好在 .h5 文件中放置大量节点。您可能想要追加并创建较少数量的节点。

It is generally not a good idea to put a LOT of nodes in an .h5 file. You probably want to append and create a smaller number of nodes.

您可以通过 .csv 进行迭代逐一存储/追加。类似于:

You can just iterate thru your .csv and store/append them one by one. Something like:

for f in files:
  df = pd.read_csv(f)
  df.to_hdf('file.h5',f,df)

将是一种方式(创建一个单独的节点)对于每个文件)

Would be one way (creating a separate node for each file)

不可追加 - 一旦你写完,你只能一次检索所有文件,例如你不能选择一个小节

Not appendable - once you write it, you can only retrieve it all at once, e.g. you cannot select a sub-section

如果你有一张桌子,那么你可以这样做:

If you have a table, then you can do things like:

pd.read_hdf('my_store.h5','a_table_node',['index>100'])

这就像一个数据库查询,只获取部分数据

which is like a database query, only getting part of the data

因此,商店不可追加,也不是可查询,而表格

这篇关于迭代地写入Pandas的HDF5商店的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-30 11:44