本文介绍了列与字典中的大 pandas ?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我的df如下:
input_data = """
Index U1 P1 ID1 U2 P2 ID2
0 A A1 A1_3 D D1 D1_7
1 A A1 A1_2 E E1 E1_4
2 A A1 A1_1 E E2 E2_6
3 A A1 A1_4 F F1 F1_12
4 B B1 B1_1 A A2 A2_1
5 B B1 B1_2 G G1 G1_3
6 C C1 C1_5 H H1 H1_5
7 A A1 A1_5 F F1 F1_3
8 A A1 A1_1 E E2 E2_3
"""
我要使用矩阵格式
[矩阵顺序作为大多数重复的P1(例如:A1)]
I want to have a matrix format[Matrix order as most repeated P1's (eg: A1)]
所需的输出:
A A1 A1_1 A1_2 A1_3 A1_4 A1_5
D D1 D1_7
E E1 E1_4
E E2 E2_6,E2_3
F F1 F1_12 F1_3
A A2
G G1
H H1
推荐答案
此处为解决方案:
data = """
Index U1 P1 ID1 U2 P2 ID2
0 A A1 A1_3 D D1 D1_7
1 A A1 A1_2 E E1 E1_4
2 A A1 A1_1 E E2 E2_6
3 A A1 A1_4 F F1 F1_12
4 B B1 B1_1 A A2 A2_1
5 B B1 B1_2 G G1 G1_3
6 C C1 C1_5 H H1 H1_5
7 A A1 A1_5 F F1 F1_3
8 A A1 A1_1 E E2 E2_3
"""
# create the sample dataframe
df = pd.read_csv(pd.compat.StringIO(data), sep='\s+')
df['U1_P1'] = '(' + df['U1'].astype(str) + ',' + df['P1'] + ')'
df['U2_P2'] = '(' + df['U2'].astype(str) + ',' + df['P2'] + ')'
df.drop(['U1', 'U2', 'P1', 'P2'], axis=1, inplace=True)
df = df.groupby(['ID1', 'U2_P2','U1_P1']).ID2.agg([('ID2', ', '.join)]).reset_index()
df_temp = df.groupby(['U1_P1'])['ID1'].agg('count').reset_index()
max_value = df_temp.loc[df_temp['ID1'].idxmax()]['ID1']
df_temp = df_temp[df_temp['ID1'] == max_value].reset_index()
for i, row in df_temp.iterrows():
u1_p1 = row['U1_P1']
filter = (df['U1_P1'] == u1_p1)
dfx = df[filter].groupby(['U1_P1'])['ID1'].apply(list).reset_index()
cols = [dfx['U1_P1'][0]] + dfx['ID1'][0]
dfx = df.groupby(['U2_P2', 'ID1']).ID2.max().unstack().reset_index().fillna('').rename(columns={'U2_P2': cols[0]})
for col in dfx.columns:
if col not in cols:
dfx.drop(col, axis=1, inplace=True)
print(dfx)
输出dfx:
ID1 (A,A1) A1_1 A1_2 A1_3 A1_4 A1_5
0 (A,A2)
1 (D,D1) D1_7
2 (E,E1) E1_4
3 (E,E2) E2_6, E2_3
4 (F,F1) F1_12 F1_3
5 (G,G1)
6 (H,H1)
这篇关于列与字典中的大 pandas ?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!