本文介绍了使用pyarrow如何将其附加到镶木地板文件中?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何使用pyarrow附加/更新到parquet文件?

How do you append/update to a parquet file with pyarrow?

import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq


 table2 = pd.DataFrame({'one': [-1, np.nan, 2.5], 'two': ['foo', 'bar', 'baz'], 'three': [True, False, True]})
 table3 = pd.DataFrame({'six': [-1, np.nan, 2.5], 'nine': ['foo', 'bar', 'baz'], 'ten': [True, False, True]})


pq.write_table(table2, './dataNew/pqTest2.parquet')
#append pqTest2 here?  

我在文档中找不到有关添加镶木地板文件的任何内容.并且,您可以将pyarrow与多处理一起使用以插入/更新数据吗?

There is nothing I found in the docs about appending parquet files. And, Can you use pyarrow with multiprocessing to insert/update the data.

推荐答案

我遇到了同样的问题,我认为我可以使用以下方法解决该问题:

I ran into the same issue and I think I was able to solve it using the following:

import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq


chunksize=10000 # this is the number of lines

pqwriter = None
for i, df in enumerate(pd.read_csv('sample.csv', chunksize=chunksize)):
    table = pa.Table.from_pandas(df)
    # for the first chunk of records
    if i == 0:
        # create a parquet write object giving it an output file
        pqwriter = pq.ParquetWriter('sample.parquet', table.schema)            
    pqwriter.write_table(table)

# close the parquet writer
if pqwriter:
    pqwriter.close()

这篇关于使用pyarrow如何将其附加到镶木地板文件中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-21 01:11