本文介绍了使用python和pandas传输和编写Parquet出现时间戳错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图用python在熊猫中连接两个拼花文件.
它可以工作,但是当我尝试将数据框架写入并保存到镶木地板文件中时,它会显示错误:

I tried to concat() two parquet file with pandas in python .
It can work , but when I try to write and save the Data frame to a parquet file ,it display the error :

 ArrowInvalid: Casting from timestamp[ns] to timestamp[ms] would lose data:

我检查了文档. of pandas,它在写入镶木地板文件时默认以ms为单位的时间戳语法.
如何在concat之后使用已使用的模式对镶木地板文件进行白色处理?
这是我的代码:

I checked the doc. of pandas, it default the timestamp syntax in ms when write the parquet file.
How can I white the parquet file with used schema after concat?
Here is my code:

import pandas as pd

table1 = pd.read_parquet(path= ('path.parquet'),engine='pyarrow')
table2 = pd.read_parquet(path= ('path.parquet'),engine='pyarrow')

table = pd.concat([table1, table2], ignore_index=True) 
table.to_parquet('./file.gzip', compression='gzip')

推荐答案

至少从v0.22起,熊猫已经将未知的kwarg转发给了底层木地板引擎.因此,使用table.to_parquet(allow_truncated_timestamps=True)应该可以-我已对pandas v0.25.0和pyarrow 0.13.0进行了验证.有关更多关键字,请参见 pyarrow文档.

Pandas already forwards unknown kwargs to the underlying parquet-engine since at least v0.22. As such, using table.to_parquet(allow_truncated_timestamps=True) should work - I verified it for pandas v0.25.0 and pyarrow 0.13.0. For more keywords see the pyarrow docs.

这篇关于使用python和pandas传输和编写Parquet出现时间戳错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-20 19:28