本文介绍了使用pyarrow如何将其附加到镶木地板文件中?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
如何使用pyarrow
附加/更新到parquet
文件?
How do you append/update to a parquet
file with pyarrow
?
import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq
table2 = pd.DataFrame({'one': [-1, np.nan, 2.5], 'two': ['foo', 'bar', 'baz'], 'three': [True, False, True]})
table3 = pd.DataFrame({'six': [-1, np.nan, 2.5], 'nine': ['foo', 'bar', 'baz'], 'ten': [True, False, True]})
pq.write_table(table2, './dataNew/pqTest2.parquet')
#append pqTest2 here?
我在文档中找不到有关添加镶木地板文件的任何内容.并且,您可以将pyarrow
与多处理一起使用以插入/更新数据吗?
There is nothing I found in the docs about appending parquet files. And, Can you use pyarrow
with multiprocessing to insert/update the data.
推荐答案
我遇到了同样的问题,我认为我可以使用以下方法解决该问题:
I ran into the same issue and I think I was able to solve it using the following:
import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq
chunksize=10000 # this is the number of lines
pqwriter = None
for i, df in enumerate(pd.read_csv('sample.csv', chunksize=chunksize)):
table = pa.Table.from_pandas(df)
# for the first chunk of records
if i == 0:
# create a parquet write object giving it an output file
pqwriter = pq.ParquetWriter('sample.parquet', table.schema)
pqwriter.write_table(table)
# close the parquet writer
if pqwriter:
pqwriter.close()
这篇关于使用pyarrow如何将其附加到镶木地板文件中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!