1. 关于反洗钱数据挖掘

由于反洗钱的数据来自银行内部,没有太多公开数据可供研究,导致在做反洗钱研究工作上不太容易。反洗钱用的交易数据规模较大,都需要好适配环境和工具。本案例的数据和模型是一个很好的示例,可供学习参考。本案例中的数据虽然只有40G,已经有些接近实际100G甚至400G的规模量了,并配有一个案例通过GNN节点分类检测反洗钱节点。

2. IBM反洗钱交易数据集

2.1 数据集链接

https://www.kaggle.com/datasets/ealtman2019/ibm-transactions-for-anti-money-laundering-aml

数据大小40G

2.2 数据集简介

洗钱是一个价值数十亿美元的问题。检测洗钱非常困难。大多数自动化算法存在高误报率:将合法交易错误地标记为洗钱。相反的情况也是一个主要问题–漏报,即未检测到的洗钱交易。自然而然,犯罪分子会努力掩盖他们的行踪。

对真实金融交易数据的访问受到严格限制–出于专有和隐私原因。即使可以访问,为每笔交易提供正确的标签(洗钱或合法)也是一个问题–如上所述。这里提供的IBM合成交易数据避免了这些问题。

这里提供的数据基于一个由个人、公司和银行组成的虚拟世界。个人与其他个人和公司进行互动。同样,公司与其他公司和个人进行互动。这些互动可以采取多种形式,例如购买消费品和服务、工业用品的采购订单、支付工资、偿还贷款等等。这些金融交易通常通过银行进行,即付款人和收款人都有账户,账户形式多种多样,从支票到信用卡到比特币。

一些(少数)个人和公司在生成模型中从事犯罪行为,如走私、非法赌博、勒索等。犯罪分子通过这些非法活动获取资金,然后试图通过一系列金融交易隐藏这些非法资金的来源。用于隐藏非法资金的金融交易构成洗钱行为。因此,这里提供的数据是有标签的,可以用于训练和测试AML(反洗钱)模型以及其他目的。

创建这些数据的数据生成器不仅模拟非法活动,还可以通过任意多个交易追踪从非法活动中获得的资金,从而能够将洗钱交易与其非法来源相隔多个步骤进行标记。在此基础上,生成器可以轻松地将个别交易标记为洗钱或合法。

请注意,这个IBM生成器模拟了整个洗钱周期:

  1. 配置:例如走私非法资金的来源。
  2. 分层:将非法资金混入金融系统中。
  3. 整合:花费非法资金。

作为仅限于合成数据的另一个可能性,实际银行或其他机构通常只能访问涉及洗钱的一部分交易:涉及该银行的交易。在其他银行之间发生的交易是看不到的。因此,基于一个机构的真实交易构建的模型只能对世界有限的视角。

相比之下,这些合成交易包含了整个金融生态系统。因此,可能可以创建了解跨机构交易的广泛范围的洗钱检测模型,但只将这些模型应用于特定银行的交易以进行推断。

..                                  SMALL           MEDIUM           LARGE
..                                  HI     LI        HI      LI       HI       LI
.. Date Range HI + LI (2022)         Sep 1-10         Sep 1-16        Aug 1 - Nov 5
.. # of Days Spanned                 10     10        16      16       97       97
.. # of Bank Accounts               515K   705K     2077K   2028K    2116K    2064K
.. # of Transactions                  5M     7M       32M     31M      180M    176M
.. # of Laundering Transactions     3.6K   4.0K       35K     16K      223K    100K
.. Laundering Rate (1 per N Trans)  981   1942       905    1948       807     1750

请注意,“提供的“日期范围”是交易活动的“主要”时期。在讨论中,Marco Pianta观察到在指定的日期范围之后还有一些交易,并且这些交易都是洗钱交易。请参阅对Marco的回应,以获取对这种情况及其处理方法的更详细描述。我们感谢Marco提出这个问题。

最后,我们为每个六个数据集提供两个文件:

A. 以CSV格式列出的交易列表

B. 一个文本文件,列出了Suzumura和Kanezashi在其AMLSim模拟器中介绍的8种特定模式之一的洗钱交易。

我们注意到,并非所有数据中的洗钱都遵循这8种模式之一。与上述数据的其他方面一样,了解特定洗钱模式中涉及的所有交易是一个巨大的挑战。

以下是提供的12个文件列表:

1a. HI-Small_Trans.csv 交易
1b. HI-Small_Patterns.txt 洗钱模式交易

2a. HI-Medium_Trans.csv 交易
2b. HI-Medium_Patterns.txt 洗钱模式交易

3a. HI-Large_Trans.csv 交易
3b. HI-Large_Patterns.txt 洗钱模式交易

4a. LI-Small_Trans.csv 交易
4b. LI-Small_Patterns.txt 洗钱模式交易

5a. LI-Medium_Trans.csv 交易
5b. LI-Medium_Patterns.txt 洗钱模式交易

6a. LI-Large_Trans.csv 交易
6b. LI-Large_Patterns.txt 洗钱模式交易

2.3 数据预览

BEGIN LAUNDERING ATTEMPT - STACK
2022/08/09 05:14,00952,8139F54E0,0111632,8062C56E0,5331.44,US Dollar,5331.44,US Dollar,ACH,1
2022/08/13 13:09,0111632,8062C56E0,008456,81363F620,5602.59,US Dollar,5602.59,US Dollar,ACH,1
2022/08/15 07:40,0118693,823D5EB90,013729,801CF2E60,1400.54,US Dollar,1400.54,US Dollar,ACH,1
2022/08/15 14:19,013729,801CF2E60,0123621,81A7090F0,1467.94,US Dollar,1467.94,US Dollar,ACH,1
2022/08/13 12:40,0024750,81363F410,0213834,808757B00,16898.29,US Dollar,16898.29,US Dollar,ACH,1
2022/08/22 06:34,0213834,808757B00,000,800073EF0,17607.19,US Dollar,17607.19,US Dollar,ACH,1
END LAUNDERING ATTEMPT - STACK

BEGIN LAUNDERING ATTEMPT - CYCLE: Max 12 hops
2022/08/01 00:19,0134266,814167590,0036925,810E343A0,132713.46,Yuan,132713.46,Yuan,ACH,1
2022/08/01 13:05,0036925,810E343A0,0119211,814AB4F60,18264.20,US Dollar,18264.20,US Dollar,ACH,1
2022/08/03 13:28,0119211,814AB4F60,0132965,81B88A230,14567.69,Euro,14567.69,Euro,ACH,1
2022/08/09 02:32,0132965,81B88A230,0137089,810C71940,114329.26,Yuan,114329.26,Yuan,ACH,1
2022/08/11 07:16,0137089,810C71940,0216618,81D5302D0,14567.69,Euro,14567.69,Euro,ACH,1
2022/08/13 05:09,0216618,81D5302D0,0024083,81836B520,13629.75,Euro,13629.75,Euro,ACH,1
2022/08/15 18:04,0024083,81836B520,0038110,81B868730,97481.96,Yuan,97481.96,Yuan,ACH,1
2022/08/20 08:57,0038110,81B868730,0225015,81C6EA460,14054.71,US Dollar,14054.71,US Dollar,ACH,1
2022/08/22 12:08,0225015,81C6EA460,018112,8045CC910,13718.22,US Dollar,13718.22,US Dollar,ACH,1
2022/08/22 19:53,018112,8045CC910,007818,8037732C0,12908.33,US Dollar,12908.33,US Dollar,ACH,1
2022/08/27 07:10,007818,8037732C0,0121523,80D1BD2F0,10636.75,Euro,10636.75,Euro,ACH,1
2022/08/30 11:54,0121523,80D1BD2F0,0134266,814167590,1378736.88,Yen,1378736.88,Yen,ACH,1
END LAUNDERING ATTEMPT - CYCLE

3. 使用GNN节点分类进行反洗钱检测

本笔记本包括使用PyG库进行GNN模型训练和数据集实现。在这个例子中,我们使用HI-Small_Trans.csv作为我们的训练和测试数据集。

更多详情,请查看https://github.com/issacchan26/AntiMoneyLaunderingDetectionWithGNN

pip install torch_geometric
Requirement already satisfied: torch_geometric in /opt/conda/lib/python3.10/site-packages (2.3.1)
Requirement already satisfied: tqdm in /opt/conda/lib/python3.10/site-packages (from torch_geometric) (4.66.1)
Requirement already satisfied: numpy in /opt/conda/lib/python3.10/site-packages (from torch_geometric) (1.23.5)
Requirement already satisfied: scipy in /opt/conda/lib/python3.10/site-packages (from torch_geometric) (1.11.2)
Requirement already satisfied: jinja2 in /opt/conda/lib/python3.10/site-packages (from torch_geometric) (3.1.2)
Requirement already satisfied: requests in /opt/conda/lib/python3.10/site-packages (from torch_geometric) (2.31.0)
Requirement already satisfied: pyparsing in /opt/conda/lib/python3.10/site-packages (from torch_geometric) (3.0.9)
Requirement already satisfied: scikit-learn in /opt/conda/lib/python3.10/site-packages (from torch_geometric) (1.2.2)
Requirement already satisfied: psutil>=5.8.0 in /opt/conda/lib/python3.10/site-packages (from torch_geometric) (5.9.3)
Requirement already satisfied: MarkupSafe>=2.0 in /opt/conda/lib/python3.10/site-packages (from jinja2->torch_geometric) (2.1.3)
Requirement already satisfied: charset-normalizer<4,>=2 in /opt/conda/lib/python3.10/site-packages (from requests->torch_geometric) (3.1.0)
Requirement already satisfied: idna<4,>=2.5 in /opt/conda/lib/python3.10/site-packages (from requests->torch_geometric) (3.4)
Requirement already satisfied: urllib3<3,>=1.21.1 in /opt/conda/lib/python3.10/site-packages (from requests->torch_geometric) (1.26.15)
Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.10/site-packages (from requests->torch_geometric) (2023.7.22)
Requirement already satisfied: joblib>=1.1.1 in /opt/conda/lib/python3.10/site-packages (from scikit-learn->torch_geometric) (1.3.2)
Requirement already satisfied: threadpoolctl>=2.0.0 in /opt/conda/lib/python3.10/site-packages (from scikit-learn->torch_geometric) (3.1.0)
Note: you may need to restart the kernel to use updated packages.
# 导入所需的库
import datetime  # 用于处理日期和时间
import os  # 用于操作文件和目录
from typing import Callable, Optional  # 用于类型提示
import pandas as pd  # 用于数据处理和分析
from sklearn import preprocessing  # 用于数据预处理
import numpy as np  # 用于科学计算
import torch  # 用于构建神经网络

from torch_geometric.data import (  # 用于处理图数据
    Data,
    InMemoryDataset
)

# 设置pandas显示所有列
pd.set_option('display.max_columns', None)

# 定义数据文件路径
path = '/kaggle/input/ibm-transactions-for-anti-money-laundering-aml/HI-Small_Trans.csv'

# 读取数据文件为DataFrame对象
df = pd.read_csv(path)

3.1 数据可视化和可能的特征工程

让我们来看看数据集

# 打印DataFrame的前几行数据
print(df.head())
          Timestamp  From Bank    Account  To Bank  Account.1  \
0  2022/09/01 00:20         10  8000EBD30       10  8000EBD30   
1  2022/09/01 00:20       3208  8000F4580        1  8000F5340   
2  2022/09/01 00:00       3209  8000F4670     3209  8000F4670   
3  2022/09/01 00:02         12  8000F5030       12  8000F5030   
4  2022/09/01 00:06         10  8000F5200       10  8000F5200   

   Amount Received Receiving Currency  Amount Paid Payment Currency  \
0          3697.34          US Dollar      3697.34        US Dollar   
1             0.01          US Dollar         0.01        US Dollar   
2         14675.57          US Dollar     14675.57        US Dollar   
3          2806.97          US Dollar      2806.97        US Dollar   
4         36682.97          US Dollar     36682.97        US Dollar   

  Payment Format  Is Laundering  
0   Reinvestment              0  
1         Cheque              0  
2   Reinvestment              0  
3   Reinvestment              0  
4   Reinvestment              0  

在查看数据框之后,我们建议从所有交易中提取接收方和付款方的所有账户,以便对可疑账户进行排序。我们可以将整个数据集转化为节点分类问题,将账户视为节点,将交易视为边。

object 列应该使用 sklearnLabelEncoder 进行编码成类别。

# 打印数据框df中每一列的数据类型
print(df.dtypes)
Timestamp              object
From Bank               int64
Account                object
To Bank                 int64
Account.1              object
Amount Received       float64
Receiving Currency     object
Amount Paid           float64
Payment Currency       object
Payment Format         object
Is Laundering           int64
dtype: object

检查是否有任何空值

# 打印数据框df中每列的缺失值数量
print(df.isnull().sum())
Timestamp             0
From Bank             0
Account               0
To Bank               0
Account.1             0
Amount Received       0
Receiving Currency    0
Amount Paid           0
Payment Currency      0
Payment Format        0
Is Laundering         0
dtype: int64

有两列分别表示每笔交易的支付和收到的金额,想知道当它们的值相同时,是否有必要将金额拆分为两列,除非存在不同货币之间的交易费用/交易。让我们找出来。

# 打印"Amount Received equals to Amount Paid:",表示输出"收到的金额等于支付的金额"
print('Amount Received equals to Amount Paid:')
# 判断df中的"Amount Received"列是否等于"Amount Paid"列,返回一个布尔值
print(df['Amount Received'].equals(df['Amount Paid']))

# 打印"Receiving Currency equals to Payment Currency:",表示输出"收款货币等于支付货币"
print('Receiving Currency equals to Payment Currency:')
# 判断df中的"Receiving Currency"列是否等于"Payment Currency"列,返回一个布尔值
print(df['Receiving Currency'].equals(df['Payment Currency']))
Amount Received equals to Amount Paid:
False
Receiving Currency equals to Payment Currency:
False

似乎涉及不同货币之间的交易,让我们将其打印出来

# 选择不相等的行
not_equal1 = df.loc[~(df['Amount Received'] == df['Amount Paid'])]
# 选择不相等的行
not_equal2 = df.loc[~(df['Receiving Currency'] == df['Payment Currency'])]

# 打印不相等的行
print(not_equal1)
print('---------------------------------------------------------------------------')
print(not_equal2)
                Timestamp  From Bank    Account  To Bank  Account.1  \
1173     2022/09/01 00:22       1362  80030A870     1362  80030A870   
7156     2022/09/01 00:28      11318  800C51010    11318  800C51010   
7925     2022/09/01 00:12        795  800D98770      795  800D98770   
8467     2022/09/01 00:01       1047  800E92CF0     1047  800E92CF0   
11529    2022/09/01 00:22      11157  80135FFC0    11157  80135FFC0   
...                   ...        ...        ...      ...        ...   
5078167  2022/09/10 23:30      23537  803949A90    23537  803949A90   
5078234  2022/09/10 23:59      16163  803638A90    16163  803638A90   
5078236  2022/09/10 23:55      16163  803638A90    16163  803638A90   
5078316  2022/09/10 23:44     215064  808F06E11   215064  808F06E10   
5078318  2022/09/10 23:45     215064  808F06E11   215064  808F06E10   

         Amount Received Receiving Currency  Amount Paid Payment Currency  \
1173           52.110000               Euro        61.06        US Dollar   
7156           76.060000               Euro        89.12        US Dollar   
7925           17.690000  Australian Dollar        12.52        US Dollar   
8467           19.430000               Euro        22.77        US Dollar   
11529          98.340000               Euro       115.24        US Dollar   
...                  ...                ...          ...              ...   
5078167     26421.500000             Shekel      7823.96        US Dollar   
5078234     47517.490000        Saudi Riyal     12667.62        US Dollar   
5078236     11329.850000        Saudi Riyal      3020.41        US Dollar   
5078316         0.000006            Bitcoin         0.07        US Dollar   
5078318         0.000004            Bitcoin         0.05        US Dollar   

        Payment Format  Is Laundering  
1173               ACH              0  
7156               ACH              0  
7925               ACH              0  
8467               ACH              0  
11529              ACH              0  
...                ...            ...  
5078167            ACH              0  
5078234            ACH              0  
5078236            ACH              0  
5078316            ACH              0  
5078318           Wire              0  

[72158 rows x 11 columns]
---------------------------------------------------------------------------
                Timestamp  From Bank    Account  To Bank  Account.1  \
1173     2022/09/01 00:22       1362  80030A870     1362  80030A870   
7156     2022/09/01 00:28      11318  800C51010    11318  800C51010   
7925     2022/09/01 00:12        795  800D98770      795  800D98770   
8467     2022/09/01 00:01       1047  800E92CF0     1047  800E92CF0   
11529    2022/09/01 00:22      11157  80135FFC0    11157  80135FFC0   
...                   ...        ...        ...      ...        ...   
5078167  2022/09/10 23:30      23537  803949A90    23537  803949A90   
5078234  2022/09/10 23:59      16163  803638A90    16163  803638A90   
5078236  2022/09/10 23:55      16163  803638A90    16163  803638A90   
5078316  2022/09/10 23:44     215064  808F06E11   215064  808F06E10   
5078318  2022/09/10 23:45     215064  808F06E11   215064  808F06E10   

         Amount Received Receiving Currency  Amount Paid Payment Currency  \
1173           52.110000               Euro        61.06        US Dollar   
7156           76.060000               Euro        89.12        US Dollar   
7925           17.690000  Australian Dollar        12.52        US Dollar   
8467           19.430000               Euro        22.77        US Dollar   
11529          98.340000               Euro       115.24        US Dollar   
...                  ...                ...          ...              ...   
5078167     26421.500000             Shekel      7823.96        US Dollar   
5078234     47517.490000        Saudi Riyal     12667.62        US Dollar   
5078236     11329.850000        Saudi Riyal      3020.41        US Dollar   
5078316         0.000006            Bitcoin         0.07        US Dollar   
5078318         0.000004            Bitcoin         0.05        US Dollar   

        Payment Format  Is Laundering  
1173               ACH              0  
7156               ACH              0  
7925               ACH              0  
8467               ACH              0  
11529              ACH              0  
...                ...            ...  
5078167            ACH              0  
5078234            ACH              0  
5078236            ACH              0  
5078316            ACH              0  
5078318           Wire              0  

[72170 rows x 11 columns]

两个数据框的大小显示存在交易费用和不同货币之间的交易,我们不能合并/删除金额列。

由于我们将要对列进行编码,我们必须确保相同属性的类别是对齐的。
让我们检查一下收款货币和付款货币的列表是否相同。

# 打印出df['Receiving Currency']列中的唯一值,并按照字母顺序进行排序
print(sorted(df['Receiving Currency'].unique()))

# 打印出df['Payment Currency']列中的唯一值,并按照字母顺序进行排序
print(sorted(df['Payment Currency'].unique()))
['Australian Dollar', 'Bitcoin', 'Brazil Real', 'Canadian Dollar', 'Euro', 'Mexican Peso', 'Ruble', 'Rupee', 'Saudi Riyal', 'Shekel', 'Swiss Franc', 'UK Pound', 'US Dollar', 'Yen', 'Yuan']
['Australian Dollar', 'Bitcoin', 'Brazil Real', 'Canadian Dollar', 'Euro', 'Mexican Peso', 'Ruble', 'Rupee', 'Saudi Riyal', 'Shekel', 'Swiss Franc', 'UK Pound', 'US Dollar', 'Yen', 'Yuan']

3.2 数据预处理

** 首先,我们将展示在PyG数据集中使用的函数,数据集和模型训练将在底部部分提供。**

在数据预处理中,我们执行以下转换:

  1. 使用最小-最大归一化转换时间戳。
  2. 通过将银行代码与账号相加,为每个账户创建唯一ID。
  3. 使用接收账户、接收金额和货币信息创建receiving_df。
  4. 使用付款账户、付款金额和货币信息创建paying_df。
  5. 创建一个包含所有交易中使用的货币的列表。
  6. 使用sklearn LabelEncoder将’Payment Format’、'Payment Currency’和’Receiving Currency’标记为类别。
# 定义函数 df_label_encoder,用于对指定的列进行标签编码
# 参数:
# - df: 待处理的数据框
# - columns: 需要进行标签编码的列名列表
# 返回值:
# - 经过标签编码后的数据框

# 导入 preprocessing 模块
# 注意:此处不需要增加 import 语句,因为已经明确说明不需要增加

def df_label_encoder(df, columns):
    # 创建 LabelEncoder 对象
    le = preprocessing.LabelEncoder()
    
    # 遍历指定的列名列表
    for i in columns:
        # 将列中的值转换为字符串类型,并进行标签编码
        df[i] = le.fit_transform(df[i].astype(str))
    
    # 返回经过标签编码后的数据框
    return df


# 定义函数 preprocess,用于对数据进行预处理
# 参数:
# - df: 待处理的数据框
# 返回值:
# - 经过预处理后的数据框 df
# - 经过处理后的 receiving_df 数据框
# - 经过处理后的 paying_df 数据框
# - 接收货币种类列表 currency_ls

def preprocess(df):
    # 调用 df_label_encoder 函数,对指定列进行标签编码
    df = df_label_encoder(df,['Payment Format', 'Payment Currency', 'Receiving Currency'])
    
    # 将 'Timestamp' 列转换为日期时间类型
    df['Timestamp'] = pd.to_datetime(df['Timestamp'])
    
    # 将 'Timestamp' 列转换为时间戳,并将其归一化到 [0, 1] 范围内
    df['Timestamp'] = df['Timestamp'].apply(lambda x: x.value)
    df['Timestamp'] = (df['Timestamp']-df['Timestamp'].min())/(df['Timestamp'].max()-df['Timestamp'].min())
    
    # 将 'Account' 列和 'From Bank' 列进行字符串拼接,作为新的 'Account' 列
    df['Account'] = df['From Bank'].astype(str) + '_' + df['Account']
    
    # 将 'Account.1' 列和 'To Bank' 列进行字符串拼接,作为新的 'Account.1' 列
    df['Account.1'] = df['To Bank'].astype(str) + '_' + df['Account.1']
    
    # 按照 'Account' 列进行升序排序
    df = df.sort_values(by=['Account'])
    
    # 从 df 中提取 'Account.1', 'Amount Received', 'Receiving Currency' 列,作为 receiving_df 数据框
    receiving_df = df[['Account.1', 'Amount Received', 'Receiving Currency']]
    
    # 从 df 中提取 'Account', 'Amount Paid', 'Payment Currency' 列,作为 paying_df 数据框
    paying_df = df[['Account', 'Amount Paid', 'Payment Currency']]
    
    # 将 receiving_df 的 'Account.1' 列重命名为 'Account'
    receiving_df = receiving_df.rename({'Account.1': 'Account'}, axis=1)
    
    # 获取 df 中 'Receiving Currency' 列的唯一值,并进行排序,得到货币种类列表 currency_ls
    currency_ls = sorted(df['Receiving Currency'].unique())
    
    # 返回处理后的数据框 df、receiving_df、paying_df 和 currency_ls
    return df, receiving_df, paying_df, currency_ls

让我们来看一下处理过的df

# 调用预处理函数,传入数据框df,并接收返回的处理后的数据框df、接收方数据框receiving_df、支付方数据框paying_df和货币列表currency_ls
df, receiving_df, paying_df, currency_ls = preprocess(df)

# 打印处理后的数据框df的前几行
print(df.head())
         Timestamp  From Bank          Account  To Bank        Account.1  \
4278714   0.456320      10057  10057_803A115E0    29467  29467_803E020C0   
2798190   0.285018      10057  10057_803A115E0    29467  29467_803E020C0   
2798191   0.284233      10057  10057_803A115E0    29467  29467_803E020C0   
3918769   0.417079      10057  10057_803A115E0    29467  29467_803E020C0   
213094    0.000746      10057  10057_803A115E0    10057  10057_803A115E0   

         Amount Received  Receiving Currency  Amount Paid  Payment Currency  \
4278714        787197.11                  13    787197.11                13   
2798190        787197.11                  13    787197.11                13   
2798191        681262.19                  13    681262.19                13   
3918769        681262.19                  13    681262.19                13   
213094         146954.27                  13    146954.27                13   

         Payment Format  Is Laundering  
4278714               3              0  
2798190               3              0  
2798191               4              0  
3918769               4              0  
213094                5              0  

支付数据框和接收数据框:

# 打印receiving_df的前几行数据
print(receiving_df.head())

# 打印paying_df的前几行数据
print(paying_df.head())
                 Account  Amount Received  Receiving Currency
4278714  29467_803E020C0        787197.11                  13
2798190  29467_803E020C0        787197.11                  13
2798191  29467_803E020C0        681262.19                  13
3918769  29467_803E020C0        681262.19                  13
213094   10057_803A115E0        146954.27                  13
                 Account  Amount Paid  Payment Currency
4278714  10057_803A115E0    787197.11                13
2798190  10057_803A115E0    787197.11                13
2798191  10057_803A115E0    681262.19                13
3918769  10057_803A115E0    681262.19                13
213094   10057_803A115E0    146954.27                13

货币列表:

# 打印出上述列表
print(currency_ls)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]

我们希望从付款方和收款方中提取所有唯一的账户作为我们图表的节点。这包括唯一的账户ID、银行代码和“是否洗钱”的标签。

在本节中,我们将考虑涉及非法交易的付款方和收款方都视为可疑账户,我们将标记这两个账户的“是否洗钱”为1。

# 定义函数get_all_account,参数为df
def get_all_account(df):
    # 从df中选择'Account'和'From Bank'列,赋值给ldf
    ldf = df[['Account', 'From Bank']]
    # 从df中选择'Account.1'和'To Bank'列,赋值给rdf
    rdf = df[['Account.1', 'To Bank']]
    # 从df中选择'Is Laundering'等于1的行,赋值给suspicious
    suspicious = df[df['Is Laundering']==1]
    # 从suspicious中选择'Account'和'Is Laundering'列,赋值给s1
    s1 = suspicious[['Account', 'Is Laundering']]
    # 从suspicious中选择'Account.1'和'Is Laundering'列,赋值给s2
    s2 = suspicious[['Account.1', 'Is Laundering']]
    # 将s2的'Account.1'列重命名为'Account',赋值给s2
    s2 = s2.rename({'Account.1': 'Account'}, axis=1)
    # 将s1和s2按行连接,赋值给suspicious
    suspicious = pd.concat([s1, s2], join='outer')
    # 删除suspicious中的重复行,赋值给suspicious
    suspicious = suspicious.drop_duplicates()

    # 将ldf的'From Bank'列重命名为'Bank',赋值给ldf
    ldf = ldf.rename({'From Bank': 'Bank'}, axis=1)
    # 将rdf的'Account.1'列重命名为'Account',将'To Bank'列重命名为'Bank',赋值给rdf
    rdf = rdf.rename({'Account.1': 'Account', 'To Bank': 'Bank'}, axis=1)
    # 将ldf和rdf按行连接,赋值给df
    df = pd.concat([ldf, rdf], join='outer')
    # 删除df中的重复行,赋值给df
    df = df.drop_duplicates()

    # 将df的'Is Laundering'列的所有值设为0
    df['Is Laundering'] = 0
    # 将df的索引设置为'Account'
    df.set_index('Account', inplace=True)
    # 使用suspicious的'Account'列作为索引,更新df的'Is Laundering'列
    df.update(suspicious.set_index('Account'))
    # 重置df的索引,赋值给df
    df = df.reset_index()
    # 返回df
    return df

请查看账户列表:

# 调用get_all_account函数,传入参数df,并将返回的结果赋值给变量accounts
accounts = get_all_account(df)

# 打印accounts DataFrame的前几行数据
print(accounts.head())
           Account   Bank  Is Laundering
0  10057_803A115E0  10057              0
1  10057_803AA8E90  10057              0
2  10057_803AAB430  10057              0
3  10057_803AACE20  10057              0
4  10057_803AB4F70  10057              0

3.3 节点特征

对于节点特征,我们希望将不同类型货币的支付和收到金额的平均值作为每个节点的新特征进行聚合。

# 定义函数paid_currency_aggregate,用于计算支付货币的平均值
# 参数:
# - currency_ls: 支付货币的列表
# - paying_df: 支付数据的DataFrame
# - accounts: 账户信息的DataFrame
def paid_currency_aggregate(currency_ls, paying_df, accounts):
    # 遍历支付货币列表
    for i in currency_ls:
        # 从支付数据中筛选出支付货币为当前货币的数据
        temp = paying_df[paying_df['Payment Currency'] == i]
        # 计算每个账户的平均支付金额,并将结果存储在accounts中
        accounts['avg paid '+str(i)] = temp['Amount Paid'].groupby(temp['Account']).transform('mean')
    # 返回计算后的账户信息
    return accounts

# 定义函数received_currency_aggregate,用于计算收到货币的平均值
# 参数:
# - currency_ls: 收到货币的列表
# - receiving_df: 收款数据的DataFrame
# - accounts: 账户信息的DataFrame
def received_currency_aggregate(currency_ls, receiving_df, accounts):
    # 遍历收到货币列表
    for i in currency_ls:
        # 从收款数据中筛选出收到货币为当前货币的数据
        temp = receiving_df[receiving_df['Receiving Currency'] == i]
        # 计算每个账户的平均收款金额,并将结果存储在accounts中
        accounts['avg received '+str(i)] = temp['Amount Received'].groupby(temp['Account']).transform('mean')
    # 将缺失值填充为0
    accounts = accounts.fillna(0)
    # 返回计算后的账户信息
    return accounts

现在,我们可以通过银行代码和不同货币类型的付款和收款金额的平均值来定义节点属性。

# 定义函数get_node_attr,接收四个参数:currency_ls, paying_df, receiving_df, accounts
def get_node_attr(currency_ls, paying_df,receiving_df, accounts):
    # 调用paid_currency_aggregate函数,将返回的结果赋值给node_df
    node_df = paid_currency_aggregate(currency_ls, paying_df, accounts)
    # 调用received_currency_aggregate函数,将返回的结果赋值给node_df
    node_df = received_currency_aggregate(currency_ls, receiving_df, node_df)
    # 将node_df中的'Is Laundering'列的值转换为torch.float类型,并赋值给node_label
    node_label = torch.from_numpy(node_df['Is Laundering'].values).to(torch.float)
    # 从node_df中删除'Account'和'Is Laundering'两列,并赋值给node_df
    node_df = node_df.drop(['Account', 'Is Laundering'], axis=1)
    # 调用df_label_encoder函数,将node_df中的'Bank'列进行标签编码,并赋值给node_df
    node_df = df_label_encoder(node_df,['Bank'])
    # 将node_df转换为torch.float类型,并赋值给node_df(用于可视化时取消注释)
    # node_df = torch.from_numpy(node_df.values).to(torch.float)
    # 返回node_df和node_label作为函数的结果
    return node_df, node_label

请查看node_df的内容:

# 导入了get_node_attr函数,该函数用于获取节点属性
# currency_ls是货币列表,paying_df是支付数据,receiving_df是收款数据,accounts是账户数据
node_df, node_label = get_node_attr(currency_ls, paying_df, receiving_df, accounts)

# 打印node_df的前五行数据
print(node_df.head())
   Bank  avg paid 0  avg paid 1  avg paid 2  avg paid 3  avg paid 4  \
0     2         0.0         0.0         0.0         0.0         0.0   
1     2         0.0         0.0         0.0         0.0         0.0   
2     2         0.0         0.0         0.0         0.0         0.0   
3     2         0.0         0.0         0.0         0.0         0.0   
4     2         0.0         0.0         0.0         0.0         0.0   

   avg paid 5  avg paid 6  avg paid 7  avg paid 8  avg paid 9  avg paid 10  \
0         0.0         0.0         0.0         0.0         0.0          0.0   
1         0.0         0.0         0.0         0.0         0.0          0.0   
2         0.0         0.0         0.0         0.0         0.0          0.0   
3         0.0         0.0         0.0         0.0         0.0          0.0   
4         0.0         0.0         0.0         0.0         0.0          0.0   

   avg paid 11   avg paid 12  avg paid 13  avg paid 14  avg received 0  \
0          0.0   1922.000000          0.0          0.0             0.0   
1          0.0    480.223333          0.0          0.0             0.0   
2          0.0  14675.570000          0.0          0.0             0.0   
3          0.0  37340.843333          0.0          0.0             0.0   
4          0.0  49649.409677          0.0          0.0             0.0   

   avg received 1  avg received 2  avg received 3  avg received 4  \
0             0.0             0.0             0.0             0.0   
1             0.0             0.0             0.0             0.0   
2             0.0             0.0             0.0             0.0   
3             0.0             0.0             0.0             0.0   
4             0.0             0.0             0.0             0.0   

   avg received 5  avg received 6  avg received 7  avg received 8  \
0             0.0             0.0             0.0             0.0   
1             0.0             0.0             0.0             0.0   
2             0.0             0.0             0.0             0.0   
3             0.0             0.0             0.0             0.0   
4             0.0             0.0             0.0             0.0   

   avg received 9  avg received 10  avg received 11  avg received 12  \
0             0.0              0.0              0.0       330.166429   
1             0.0              0.0              0.0       119.992000   
2             0.0              0.0              0.0     14675.570000   
3             0.0              0.0              0.0       756.486190   
4             0.0              0.0              0.0      3120.573333   

   avg received 13  avg received 14  
0              0.0              0.0  
1              0.0              0.0  
2              0.0              0.0  
3              0.0              0.0  
4              0.0              0.0  

3.4 边特征

在边特征方面,我们希望将每个交易视为边。
对于边索引,我们将所有帐户替换为索引,并将其堆叠到大小为[2,交易数]的列表中。
对于边属性,我们使用“时间戳”、“收到的金额”、“收款货币”、“支付金额”、“支付货币”和“支付格式”。

def get_edge_df(accounts, df):
    # 重置accounts的索引,并且将索引作为新的一列ID
    accounts = accounts.reset_index(drop=True)
    accounts['ID'] = accounts.index
    
    # 创建一个映射字典,将账户名映射为对应的ID
    mapping_dict = dict(zip(accounts['Account'], accounts['ID']))
    
    # 将df中的'Account'列的值通过映射字典转换为对应的ID,并赋值给'From'列
    df['From'] = df['Account'].map(mapping_dict)
    
    # 将df中的'Account.1'列的值通过映射字典转换为对应的ID,并赋值给'To'列
    df['To'] = df['Account.1'].map(mapping_dict)
    
    # 删除df中的'Account', 'Account.1', 'From Bank', 'To Bank'列
    df = df.drop(['Account', 'Account.1', 'From Bank', 'To Bank'], axis=1)
    
    # 创建一个二维张量,其中第一行是df['From']列的值,第二行是df['To']列的值
    edge_index = torch.stack([torch.from_numpy(df['From'].values), torch.from_numpy(df['To'].values)], dim=0)
    
    # 删除df中的'Is Laundering', 'From', 'To'列
    df = df.drop(['Is Laundering', 'From', 'To'], axis=1)
    
    # 将df转换为edge_attr,用于可视化
    edge_attr = df  # for visualization
    
    return edge_attr, edge_index

3.5 边属性

边属性是指在图中每条边上所携带的信息,可以是数字、文本、向量等形式。在图神经网络中,边属性通常用于描述边的权重、距离、相似度等信息,以便于模型学习图的结构和特征。边属性的定义和使用需要根据具体的应用场景进行设计和调整。

# 导入了名为get_edge_df的函数,用于从accounts和df两个数据集中获取边的属性和边的索引
# edge_attr是边的属性,edge_index是边的索引

# 调用get_edge_df函数,并将返回的结果赋值给edge_attr和edge_index
edge_attr, edge_index = get_edge_df(accounts, df)

# 打印edge_attr的前几行数据
print(edge_attr.head())
         Timestamp  Amount Received  Receiving Currency  Amount Paid  \
4278714   0.456320        787197.11                  13    787197.11   
2798190   0.285018        787197.11                  13    787197.11   
2798191   0.284233        681262.19                  13    681262.19   
3918769   0.417079        681262.19                  13    681262.19   
213094    0.000746        146954.27                  13    146954.27   

         Payment Currency  Payment Format  
4278714                13               3  
2798190                13               3  
2798191                13               4  
3918769                13               4  
213094                 13               5  

3.6 edge_index

edge_index 是一个表示图中边的索引的张量。它是一个大小为 2 × E 2 \times E 2×E 的张量,其中 E E E 是图中边的数量。每一列表示一条边,其中第一行是源节点的索引,第二行是目标节点的索引。

例如,对于一个有 N N N 个节点和 M M M 条边的无向图,edge_index 可以如下表示:

edge_index = torch.tensor([
    [0, 0, 1, 1, 2, 3, 4, 4, 5, 6],
    [1, 2, 0, 3, 1, 4, 3, 5, 4, 4],
])

其中,第一列 [0, 0, 1, 1, 2, 3, 4, 4, 5, 6] 表示源节点的索引,第二列 [1, 2, 0, 3, 1, 4, 3, 5, 4, 4] 表示目标节点的索引。这个张量表示了以下的边:

0 -- 1
0 -- 2
1 -- 0
1 -- 3
2 -- 1
3 -- 4
4 -- 3
4 -- 5
5 -- 4
6 -- 4
# 打印出变量edge_index的值
tensor([[     0,      0,      0,  ..., 496997, 496997, 496998],
        [299458, 299458, 299458,  ..., 496997, 496997, 496998]])

3.7 最终代码

下面我们将展示model.py、train.py和dataset.py的最终代码

3.8 模型架构

在本节中,我们使用了图注意力网络作为我们的骨干模型。该模型由两个GATConv层构建,后跟一个具有sigmoid输出的线性层,用于分类。

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch_geometric.transforms as T
from torch_geometric.nn import GATConv, Linear

class GAT(torch.nn.Module):
    def __init__(self, in_channels, hidden_channels, out_channels, heads):
        super().__init__()
        self.conv1 = GATConv(in_channels, hidden_channels, heads, dropout=0.6)  # 第一层GAT卷积层,输入特征维度为in_channels,输出特征维度为hidden_channels,头数为heads,使用dropout防止过拟合
        self.conv2 = GATConv(hidden_channels * heads, int(hidden_channels/4), heads=1, concat=False, dropout=0.6)  # 第二层GAT卷积层,输入特征维度为hidden_channels * heads,输出特征维度为int(hidden_channels/4),头数为1,不进行特征拼接,使用dropout防止过拟合
        self.lin = Linear(int(hidden_channels/4), out_channels)  # 线性层,输入特征维度为int(hidden_channels/4),输出特征维度为out_channels
        self.sigmoid = nn.Sigmoid()  # sigmoid激活函数

    def forward(self, x, edge_index, edge_attr):
        x = F.dropout(x, p=0.6, training=self.training)  # dropout层,以0.6的概率对输入进行dropout操作,用于防止过拟合
        x = F.elu(self.conv1(x, edge_index, edge_attr))  # GAT卷积层1的前向传播,使用elu激活函数
        x = F.dropout(x, p=0.6, training=self.training)  # dropout层,以0.6的概率对输入进行dropout操作,用于防止过拟合
        x = F.elu(self.conv2(x, edge_index, edge_attr))  # GAT卷积层2的前向传播,使用elu激活函数
        x = self.lin(x)  # 线性层的前向传播
        x = self.sigmoid(x)  # sigmoid激活函数的前向传播
        
        return x

3.9 PyG InMemoryDataset

最后,我们可以使用上述函数构建数据集。


class AMLtoGraph(InMemoryDataset):
    def __init__(self, root: str, edge_window_size: int = 10,
                 transform: Optional[Callable] = None,
                 pre_transform: Optional[Callable] = None):
        # 初始化函数,接收root(数据存储路径)、edge_window_size(边窗口大小)、transform(数据转换函数)、pre_transform(预处理函数)作为参数
        self.edge_window_size = edge_window_size
        super().__init__(root, transform, pre_transform)
        # 调用父类的初始化函数
        self.data, self.slices = torch.load(self.processed_paths[0])
        # 加载已处理的数据

    @property
    def raw_file_names(self) -> str:
        # 返回原始数据文件名
        return 'HI-Small_Trans.csv'

    @property
    def processed_file_names(self) -> str:
        # 返回处理后的数据文件名
        return 'data.pt'

    @property
    def num_nodes(self) -> int:
        # 返回节点数量
        return self._data.edge_index.max().item() + 1

    def df_label_encoder(self, df, columns):
        # 对DataFrame中的指定列进行标签编码
        le = preprocessing.LabelEncoder()
        for i in columns:
            df[i] = le.fit_transform(df[i].astype(str))
        return df

    def preprocess(self, df):
        # 数据预处理函数,对原始数据进行处理
        df = self.df_label_encoder(df,['Payment Format', 'Payment Currency', 'Receiving Currency'])
        # 对指定列进行标签编码
        df['Timestamp'] = pd.to_datetime(df['Timestamp'])
        df['Timestamp'] = df['Timestamp'].apply(lambda x: x.value)
        df['Timestamp'] = (df['Timestamp']-df['Timestamp'].min())/(df['Timestamp'].max()-df['Timestamp'].min())
        # 将时间戳转换为数值,并进行归一化处理

        df['Account'] = df['From Bank'].astype(str) + '_' + df['Account']
        df['Account.1'] = df['To Bank'].astype(str) + '_' + df['Account.1']
        # 将银行和账户名合并为新的账户名
        df = df.sort_values(by=['Account'])
        # 按照账户名排序
        receiving_df = df[['Account.1', 'Amount Received', 'Receiving Currency']]
        paying_df = df[['Account', 'Amount Paid', 'Payment Currency']]
        # 提取收款和付款相关的列
        receiving_df = receiving_df.rename({'Account.1': 'Account'}, axis=1)
        # 重命名列名
        currency_ls = sorted(df['Receiving Currency'].unique())
        # 获取唯一的货币种类

        return df, receiving_df, paying_df, currency_ls

    def get_all_account(self, df):
        # 获取所有账户
        ldf = df[['Account', 'From Bank']]
        rdf = df[['Account.1', 'To Bank']]
        suspicious = df[df['Is Laundering']==1]
        s1 = suspicious[['Account', 'Is Laundering']]
        s2 = suspicious[['Account.1', 'Is Laundering']]
        s2 = s2.rename({'Account.1': 'Account'}, axis=1)
        suspicious = pd.concat([s1, s2], join='outer')
        suspicious = suspicious.drop_duplicates()
        # 提取可疑账户

        ldf = ldf.rename({'From Bank': 'Bank'}, axis=1)
        rdf = rdf.rename({'Account.1': 'Account', 'To Bank': 'Bank'}, axis=1)
        df = pd.concat([ldf, rdf], join='outer')
        df = df.drop_duplicates()
        # 合并账户信息

        df['Is Laundering'] = 0
        df.set_index('Account', inplace=True)
        df.update(suspicious.set_index('Account'))
        df = df.reset_index()
        # 更新账户的洗钱标签

        return df

    def paid_currency_aggregate(self, currency_ls, paying_df, accounts):
        # 按付款货币种类对账户进行聚合
        for i in currency_ls:
            temp = paying_df[paying_df['Payment Currency'] == i]
            accounts['avg paid '+str(i)] = temp['Amount Paid'].groupby(temp['Account']).transform('mean')
        return accounts

    def received_currency_aggregate(self, currency_ls, receiving_df, accounts):
        # 按收款货币种类对账户进行聚合
        for i in currency_ls:
            temp = receiving_df[receiving_df['Receiving Currency'] == i]
            accounts['avg received '+str(i)] = temp['Amount Received'].groupby(temp['Account']).transform('mean')
        accounts = accounts.fillna(0)
        return accounts

    def get_edge_df(self, accounts, df):
        # 获取边的DataFrame
        accounts = accounts.reset_index(drop=True)
        accounts['ID'] = accounts.index
        mapping_dict = dict(zip(accounts['Account'], accounts['ID']))
        df['From'] = df['Account'].map(mapping_dict)
        df['To'] = df['Account.1'].map(mapping_dict)
        df = df.drop(['Account', 'Account.1', 'From Bank', 'To Bank'], axis=1)

        edge_index = torch.stack([torch.from_numpy(df['From'].values), torch.from_numpy(df['To'].values)], dim=0)

        df = df.drop(['Is Laundering', 'From', 'To'], axis=1)

        edge_attr = torch.from_numpy(df.values).to(torch.float)
        return edge_attr, edge_index

    def get_node_attr(self, currency_ls, paying_df,receiving_df, accounts):
        # 获取节点属性
        node_df = self.paid_currency_aggregate(currency_ls, paying_df, accounts)
        node_df = self.received_currency_aggregate(currency_ls, receiving_df, node_df)
        node_label = torch.from_numpy(node_df['Is Laundering'].values).to(torch.float)
        node_df = node_df.drop(['Account', 'Is Laundering'], axis=1)
        node_df = self.df_label_encoder(node_df,['Bank'])
        node_df = torch.from_numpy(node_df.values).to(torch.float)
        return node_df, node_label

    def process(self):
        # 数据处理函数
        df = pd.read_csv(self.raw_paths[0])
        df, receiving_df, paying_df, currency_ls = self.preprocess(df)
        accounts = self.get_all_account(df)
        node_attr, node_label = self.get_node_attr(currency_ls, paying_df,receiving_df, accounts)
        edge_attr, edge_index = self.get_edge_df(accounts, df)

        data = Data(x=node_attr,
                    edge_index=edge_index,
                    y=node_label,
                    edge_attr=edge_attr
                    )
        # 构建Data对象

        data_list = [data] 
        if self.pre_filter is not None:
            data_list = [d for d in data_list if self.pre_filter(d)]
        # 过滤数据

        if self.pre_transform is not None:
            data_list = [self.pre_transform(d) for d in data_list]
        # 数据预处理

        data, slices = self.collate(data_list)
        # 将数据列表转换为Batch对象
        torch.save((data, slices), self.processed_paths[0])
        # 保存处理后的数据

3.10 模型训练

请在开始训练之前按照https://github.com/issacchan26/AntiMoneyLaunderingDetectionWithGNN中的说明进行操作。

import torch
import torch_geometric.transforms as T # 导入torch_geometric.transforms模块,用于数据转换
from torch_geometric.loader import NeighborLoader # 导入NeighborLoader类,用于加载数据

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') # 判断是否有GPU,有则使用GPU,否则使用CPU
dataset = AMLtoGraph('/path/to/AntiMoneyLaunderingDetectionWithGNN/data') # 加载数据集
data = dataset[0] # 取出第一个数据
epoch = 100 # 定义训练轮数

model = GAT(in_channels=data.num_features, hidden_channels=16, out_channels=1, heads=8) # 定义模型,使用GAT模型,输入特征数为data.num_features,隐藏层特征数为16,输出特征数为1,头数为8
model = model.to(device) # 将模型放到GPU上
criterion = torch.nn.BCELoss() # 定义损失函数,使用二分类交叉熵损失函数
optimizer = torch.optim.SGD(model.parameters(), lr=0.0001) # 定义优化器,使用随机梯度下降法,学习率为0.0001

split = T.RandomNodeSplit(split='train_rest', num_val=0.1, num_test=0) # 定义数据集划分方式,使用随机节点划分,将数据集划分为训练集、验证集和测试集
data = split(data) # 对数据集进行划分

train_loader = loader = NeighborLoader( # 定义训练集加载器,使用NeighborLoader类,将数据集data加载进来,每个batch的大小为256,每个节点的邻居数为30
    data,
    num_neighbors=[30] * 2,
    batch_size=256,
    input_nodes=data.train_mask,
)

test_loader = loader = NeighborLoader( # 定义测试集加载器,使用NeighborLoader类,将数据集data加载进来,每个batch的大小为256,每个节点的邻居数为30
    data,
    num_neighbors=[30] * 2,
    batch_size=256,
    input_nodes=data.val_mask,
)

for i in range(epoch): # 开始训练
    total_loss = 0 # 定义总损失
    model.train() # 将模型设置为训练模式
    for data in train_loader: # 遍历训练集
        optimizer.zero_grad() # 梯度清零
        data.to(device) # 将数据放到GPU上
        pred = model(data.x, data.edge_index, data.edge_attr) # 前向传播,得到预测值
        ground_truth = data.y # 获取真实标签
        loss = criterion(pred, ground_truth.unsqueeze(1)) # 计算损失
        loss.backward() # 反向传播,计算梯度
        optimizer.step() # 更新参数
        total_loss += float(loss) # 累加损失
    if epoch%10 == 0: # 每10轮输出一次训练结果
        print(f"Epoch: {i:03d}, Loss: {total_loss:.4f}")
        model.eval() # 将模型设置为评估模式
        acc = 0 # 定义准确率
        total = 0 # 定义总数
        for test_data in test_loader: # 遍历测试集
            test_data.to(device) # 将数据放到GPU上
            pred = model(test_data.x, test_data.edge_index, test_data.edge_attr) # 前向传播,得到预测值
            ground_truth = test_data.y # 获取真实标签
            correct = (pred == ground_truth.unsqueeze(1)).sum().item() # 计算预测正确的数量
            total += len(ground_truth) # 累加总数
            acc += correct # 累加正确数量
        acc = acc/total # 计算准确率
        print('accuracy:', acc) # 输出准确率

4.参考文献

本仓库的一些特征工程参考了以下论文,强烈推荐阅读:

  1. Weber, M., Domeniconi, G., Chen, J., Weidele, D. K. I., Bellei, C., Robinson, T., & Leiserson, C. E. (2019). Anti-money laundering in bitcoin: Experimenting with graph convolutional networks for financial forensics. arXiv preprint arXiv:1908.02591.
  2. Johannessen, F., & Jullum, M. (2023). Finding Money Launderers Using Heterogeneous Graph Neural Networks. arXiv preprint arXiv:2307.13499.
12-29 07:07