我正在尝试在join_key中执行以下逻辑。date + book + bdr + COALECSE(cusip,isin,deal,id)

+------------+------+------+-----------+--------------+------+------------+----------------------------+
|  endOfDay  | book | bdr  |   cusip   |     isin     | Deal |     Id     |          join_key          |
+------------+------+------+-----------+--------------+------+------------+----------------------------+
| 31/10/2019 |   15 | ITOR | 371494AM7 | US371494AM77 |  161 | 8013210731 | 20191031|15|ITOR|371494AM7 |
| 31/10/2019 |   15 | ITOR |           |              |      | 8011898573 | 20191031|15|ITOR|          |
| 31/10/2019 |   15 | ITOR |           |              |      | 8011898742 | 20191031|15|ITOR|          |
| 31/10/2019 |   15 | ITOR |           |              |      | 8011899418 | 20191031|15|ITOR|          |
+------------+------+------+-----------+--------------+------+------------+----------------------------+

我正在尝试使用:
df['join_key'] = ("20191031|" + df['book'].astype('str') + "|" + df['bdr'] + "|" + df[['cusip', 'isin', 'Deal', 'Id']].bfill(1)['cusip'].astype(str))

还尝试了:
df['position_join_key'] = "20191031|" + df['book'].astype('str') + "|" + df['bdr'] + "|" + df['cusip'].fillna(df['isin']).fillna(df['Deal']).fillna(df['Id']).astype('str')

由于某种原因,此代码不会选择Id作为 key 的一部分。

例如,在第二行中,我应该得到20191031|15|ITOR|8011898573

另外如果它有帮助,它来自我使用na_filter = False的csv

样本输入:
+------------+------+------+-----------+-------------+------+------------+
|  endOfDay  | book | bdr  |   cusip   |    isin     | Deal |     Id     |
+------------+------+------+-----------+-------------+------+------------+
| 31/10/2019 |   15 | ITOR | 371494AM7 |             |  161 | 8013210731 |
| 31/10/2019 |   15 | ITOR |           | 3.16248E+11 |      | 8011898573 |
| 31/10/2019 |   15 | ITOR |           |             |  352 | 8011898742 |
| 31/10/2019 |   15 | ITOR |           |             |      | 8011899418 |
+------------+------+------+-----------+-------------+------+------------+

示例输出:
+----------------------------+
|          join_key          |
+----------------------------+
| 43769|15|ITOR|371494AM7    |
| 43769|15|ITOR|316247735264 |
| 43769|15|ITOR|352          |
| 43769|15|ITOR|8011899418   |
+----------------------------+

最佳答案

我们可以通过以下一般方式解决您的问题:

  • 首先,我们创建一个名为temp的临时列,该列是回填的值。
  • 我们在您的bdr
  • 之后插入该列
  • 我们将您的日期列转换为datetime
  • 我们可以对前4列进行'|'.join并创建join_key

  • 注意:第3步是为了使您的代码保持通用性而添加的,因此我们不会像您自己那样对20191031进行硬编码。
    s = df[['cusip', 'isin', 'Deal', 'Id']].replace('', np.NaN).bfill(axis=1).iloc[:, 0]
    df.insert(3, 'temp', s)
    
    df['endOfDay'] = pd.to_datetime(df['endOfDay']).dt.strftime('%Y%m%d')
    
    df['join_key'] = df.iloc[:, :4].apply(lambda x: '|'.join(x.astype(str).to_numpy()), axis=1)
    df = df.drop(columns='temp')
    
       endOfDay  book   bdr      cusip          isin Deal          Id                     join_key
    0  20191031    15  ITOR  371494AM7  US371494AM77  161  8013210731   20191031|15|ITOR|371494AM7
    1  20191031    15  ITOR                                8011898573  20191031|15|ITOR|8011898573
    2  20191031    15  ITOR                                8011898742  20191031|15|ITOR|8011898742
    3  20191031    15  ITOR                                8011899418  20191031|15|ITOR|8011899418
    

    关于python - Pandas ,fillna/bfill到concatce和concece领域,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/58984238/

    10-13 07:17
    查看更多