本文介绍了列与字典中的大 pandas ?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的df如下:

input_data = """
Index U1   P1   ID1    U2   P2  ID2   
0     A    A1   A1_3   D    D1  D1_7
1     A    A1   A1_2   E    E1  E1_4
2     A    A1   A1_1   E    E2  E2_6
3     A    A1   A1_4   F    F1  F1_12
4     B    B1   B1_1   A    A2  A2_1
5     B    B1   B1_2   G    G1  G1_3
6     C    C1   C1_5   H    H1  H1_5
7     A    A1   A1_5   F    F1  F1_3
8     A    A1   A1_1   E    E2  E2_3
"""

我要使用矩阵格式
[矩阵顺序作为大多数重复的P1(例如:A1)]

I want to have a matrix format[Matrix order as most repeated P1's (eg: A1)]

所需的输出:

A   A1  A1_1        A1_2     A1_3    A1_4    A1_5
D   D1                       D1_7
E   E1              E1_4
E   E2  E2_6,E2_3
F   F1                               F1_12   F1_3
A   A2
G   G1
H   H1


推荐答案

此处为解决方案:

data = """
Index U1   P1   ID1    U2   P2  ID2   
0     A    A1   A1_3   D    D1  D1_7
1     A    A1   A1_2   E    E1  E1_4
2     A    A1   A1_1   E    E2  E2_6
3     A    A1   A1_4   F    F1  F1_12
4     B    B1   B1_1   A    A2  A2_1
5     B    B1   B1_2   G    G1  G1_3
6     C    C1   C1_5   H    H1  H1_5
7     A    A1   A1_5   F    F1  F1_3
8     A    A1   A1_1   E    E2  E2_3
"""


# create the sample dataframe
df = pd.read_csv(pd.compat.StringIO(data), sep='\s+')
df['U1_P1'] = '(' + df['U1'].astype(str) + ',' + df['P1'] + ')'
df['U2_P2'] = '(' + df['U2'].astype(str) + ',' + df['P2'] + ')'
df.drop(['U1', 'U2', 'P1', 'P2'], axis=1, inplace=True)
df = df.groupby(['ID1', 'U2_P2','U1_P1']).ID2.agg([('ID2', ', '.join)]).reset_index()

df_temp = df.groupby(['U1_P1'])['ID1'].agg('count').reset_index()
max_value = df_temp.loc[df_temp['ID1'].idxmax()]['ID1']
df_temp = df_temp[df_temp['ID1'] == max_value].reset_index()

for i, row in df_temp.iterrows():
    u1_p1 = row['U1_P1']
    filter = (df['U1_P1'] == u1_p1)
    dfx = df[filter].groupby(['U1_P1'])['ID1'].apply(list).reset_index()
    cols = [dfx['U1_P1'][0]] + dfx['ID1'][0]
    dfx = df.groupby(['U2_P2', 'ID1']).ID2.max().unstack().reset_index().fillna('').rename(columns={'U2_P2': cols[0]})
    for col in dfx.columns:
        if col not in cols:
            dfx.drop(col, axis=1, inplace=True)

    print(dfx)

输出dfx:

ID1  (A,A1)        A1_1  A1_2  A1_3   A1_4  A1_5
0    (A,A2)                                     
1    (D,D1)                    D1_7             
2    (E,E1)              E1_4                   
3    (E,E2)  E2_6, E2_3                         
4    (F,F1)                          F1_12  F1_3
5    (G,G1)                                     
6    (H,H1)  

这篇关于列与字典中的大 pandas ?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-30 03:24