问题描述
我有两个 DataFrame
:
df1:
date ids
0 2015-10-13 [978]
1 2015-10-14 [978, 121]
df2:
date ids
0 2015-10-13 [978, 12]
1 2015-10-14 [2, 1]
当我根据 date
合并它们时,如下所示:
When I merge them based on date
as below:
df = pandas.merge(df1, df2, on='date', sort=False)
我将拥有以下 DataFrame
:
date ids_x ids_y
0 2015-10-13 [978] [978, 12]
1 2015-10-14 [978, 121] [2, 1]
我希望将 one ids
列从两个列表中合并,例如 [978, 978, 12]
或者最好删除重复项并有一些东西像[978, 12]
.
I want to have one ids
column merged from both lists like [978, 978, 12]
or preferably removing duplicates and have something like [978, 12]
.
推荐答案
您可以将两列相加得到您要查找的列表,然后使用 df.drop()
和 >axis=1
删除 ids_x
和 ids_y
列.示例 -
You can add both columns together to get the list you are looking for, and then use df.drop()
with axis=1
to drop the ids_x
and ids_y
columns. Example -
df = pd.merge(df1, df2, on='date', sort=False)
df['ids'] = df['ids_x'] + df['ids_y']
df = df.drop(['ids_x','ids_y'],axis=1)
演示 -
In [65]: df
Out[65]:
date ids_x ids_y
0 2015-10-13 [978] [978, 12]
1 2015-10-14 [978, 121] [2, 1]
In [67]: df['ids'] = df['ids_x'] + df['ids_y']
In [68]: df
Out[68]:
date ids_x ids_y ids
0 2015-10-13 [978] [978, 12] [978, 978, 12]
1 2015-10-14 [978, 121] [2, 1] [978, 121, 2, 1]
In [70]: df = df.drop(['ids_x','ids_y'],axis=1)
In [71]: df
Out[71]:
date ids
0 2015-10-13 [978, 978, 12]
1 2015-10-14 [978, 121, 2, 1]
如果您也想删除重复值,并且不关心顺序,那么您可以使用 Series.apply
然后将列表转换为 set
然后回到list
.示例 -
If you want to remove the duplicate values as well, and you do not care about order, then you can use Series.apply
and then convert the list to set
and then back to list
. Example -
df['ids'] = df['ids'].apply(lambda x: list(set(x)))
演示 -
In [72]: df['ids'] = df['ids'].apply(lambda x: list(set(x)))
In [73]: df
Out[73]:
date ids
0 2015-10-13 [978, 12]
1 2015-10-14 [121, 978, 2, 1]
或者如果你想用 numpy.unique()
做它,你也可以将它与 Series.apply
一起使用 -
Or as asked in comments if you want to do it with numpy.unique()
, you can use that along with Series.apply
as well -
import numpy as np
df['ids'] = df['ids'].apply(lambda x: np.unique(x))
演示 -
In [79]: df['ids'] = df['ids'].apply(lambda x: np.unique(x))
In [80]: df
Out[80]:
date ids
0 2015-10-13 [12, 978]
1 2015-10-14 [1, 2, 121, 978]
这篇关于合并DataFrame时如何合并两个列表列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!