我试图在一个 id 字段上加入两个 Pandas 数据框,该字段是一个字符串 uuid。我收到一个值错误:
ValueError: 您正在尝试合并 object 和 int64 列。如果你想继续,你应该使用 pd.concat
代码如下。我正在尝试按照 Trying to merge 2 dataframes but get ValueError 将字段转换为字符串,但错误仍然存在。请注意,pdf 来自 spark dataframe.toPandas()
,而 outputPdf 来自字典。
pdf.id = pdf.id.apply(str)
outputsPdf.id = outputsPdf.id.apply(str)
inOutPdf = pdf.join(outputsPdf, on='id', how='left', rsuffix='fs')
pdf.dtypes
id object
time float64
height float32
dtype: object
outputsPdf.dtypes
id object
labels float64
dtype: object
我该如何调试?
完整追溯:
ValueError Traceback (most recent call last)
<ipython-input-13-deb429dde9ad> in <module>()
61 pdf['id'] = pdf['id'].astype(str)
62 outputsPdf['id'] = outputsPdf['id'].astype(str)
---> 63 inOutPdf = pdf.join(outputsPdf, on=['id'], how='left', rsuffix='fs')
64
65 # idSparkDf = spark.createDataFrame(idPandasDf, schema=StructType([StructField('id', StringType(), True),
~/miniconda3/lib/python3.6/site-packages/pandas/core/frame.py in join(self, other, on, how, lsuffix, rsuffix, sort)
6334 # For SparseDataFrame's benefit
6335 return self._join_compat(other, on=on, how=how, lsuffix=lsuffix,
-> 6336 rsuffix=rsuffix, sort=sort)
6337
6338 def _join_compat(self, other, on=None, how='left', lsuffix='', rsuffix='',
~/miniconda3/lib/python3.6/site-packages/pandas/core/frame.py in _join_compat(self, other, on, how, lsuffix, rsuffix, sort)
6349 return merge(self, other, left_on=on, how=how,
6350 left_index=on is None, right_index=True,
-> 6351 suffixes=(lsuffix, rsuffix), sort=sort)
6352 else:
6353 if on is not None:
~/miniconda3/lib/python3.6/site-packages/pandas/core/reshape/merge.py in merge(left, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator, validate)
59 right_index=right_index, sort=sort, suffixes=suffixes,
60 copy=copy, indicator=indicator,
---> 61 validate=validate)
62 return op.get_result()
63
~/miniconda3/lib/python3.6/site-packages/pandas/core/reshape/merge.py in __init__(self, left, right, how, on, left_on, right_on, axis, left_index, right_index, sort, suffixes, copy, indicator, validate)
553 # validate the merge keys dtypes. We may need to coerce
554 # to avoid incompat dtypes
--> 555 self._maybe_coerce_merge_keys()
556
557 # If argument passed to validate,
~/miniconda3/lib/python3.6/site-packages/pandas/core/reshape/merge.py in _maybe_coerce_merge_keys(self)
984 elif (not is_numeric_dtype(lk)
985 and (is_numeric_dtype(rk) and not is_bool_dtype(rk))):
--> 986 raise ValueError(msg)
987 elif is_datetimelike(lk) and not is_datetimelike(rk):
988 raise ValueError(msg)
最佳答案
on
参数 仅适用于调用 DataFrame !
尽管您指定了 on='id'
,但它将使用 pdf 中的 'id'
,它是一个对象,并尝试将其与采用整数值的输出 PDF 的索引连接起来。
如果您需要在跨两个 DataFrame 的非索引列上使用 join
,您可以将它们设置为索引,或者您必须使用 merge
作为 on
中的 pd.merge
参数适用于 和 DataFrame。
例子
import pandas as pd
df1 = pd.DataFrame({'id': ['1', 'True', '4'], 'vals': [10, 11, 12]})
df2 = df1.copy()
df1.join(df2, on='id', how='left', rsuffix='_fs')
另一方面,这些工作:
df1.set_index('id').join(df2.set_index('id'), how='left', rsuffix='_fs').reset_index()
# id vals vals_fs
#0 1 10 10
#1 True 11 11
#2 4 12 12
df1.merge(df2, on='id', how='left', suffixes=['', '_fs'])
# id vals vals_fs
#0 1 10 10
#1 True 11 11
#2 4 12 12
关于python - Pandas 加入字符串数据类型,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/52373285/