问题描述
我有一个带有列表lists
列的简单数据框df
.我想基于lists
生成3个其他列.
I have a simple dataframe df
with a column of lists lists
. I would like to generate 3 additional columns based on lists
.
df
看起来像:
import pandas as pd
lists={1:[[1]],2:[[1,2,3]],3:[[2,9,7,9]],4:[[2,7,3,5]]}
#create test dataframe
df=pd.DataFrame.from_dict(lists,orient='index')
df=df.rename(columns={0:'lists'})
df
lists
1 [1]
2 [1, 2, 3]
3 [2, 9, 7, 9]
4 [2, 7, 3, 5]
我希望df
看起来像这样:
lists cumset adds drops
1 [1] {1} {1} {}
2 [1,2,3] {1,2,3} {2,3} {}
3 [2,9,7,9] {1,2,3,7,9} {7,9} {3}
4 [2,7,3,5] {1,2,3,5,7,9} {3,5} {9}
基本上,我需要弄清楚如何创建cumset
(某种类型的apply?,(已经有熊猫函数吗?).那么对于添加和删除,基本上我们想将df.lists与df.lists.shift(),然后确定哪些是新的,哪些是丢失的.也许像这样:
Basically I need to figure out how to create cumset
(some type of apply?, (is there already a pandas function?). Then for the adds and drops, basically we want to compare the df.lists to the df.lists.shift(), and determine which items are new and which items are missing. maybe something like:
df['adds']=df[['lists',df.lists.shift()]].apply(lambda x: {i for i in x.lists if i not in x.lists.shift()}, axis=1)
玩得开心,谢谢.
推荐答案
您可以使用 pandas.DataFrame.cumsum 来创建累积列,并使用集而不是列表来创建列,并使用 pandas.DataFrame.shift 来创建添加"和删除"列:
You can use pandas.DataFrame.cumsum to make the cumulative column and make a column with sets instead of lists and use pandas.DataFrame.shift to make "add" and "drop" columns:
import pandas as pd
import numpy as np
df['cumset'] = df['lists'].cumsum().apply(lambda x: np.unique(x))
df['sets'] = df['lists'].apply(lambda x: set(x))
shifted = df['sets'].shift(1).apply(lambda x: x if not pd.isnull(x) else set())
df['add'] = df['sets'] - shifted
df['drop'] = shifted - df['sets']
df = df.drop('sets', axis=1)
print(df)
#-->Output:
lists cumset add drop
1 [1] [1] {1} {}
2 [1, 2, 3] [1, 2, 3] {2, 3} {}
3 [2, 9, 7, 9] [1, 2, 3, 7, 9] {9, 7} {1, 3}
4 [2, 7, 3, 5] [1, 2, 3, 5, 7, 9] {3, 5} {9}
这篇关于Pandas Dataframe,列表列,创建累积列表集列,并按记录差异进行记录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!