一、每日一题

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| actor_id    | int     |
| director_id | int     |
| timestamp   | int     |
+-------------+---------+
timestamp 是这张表的主键(具有唯一值的列).

编写解决方案找出合作过至少三次的演员和导演的 id 对 (actor_id, director_id)

示例 1:

输入:
ActorDirector 表:
+-------------+-------------+-------------+
| actor_id    | director_id | timestamp   |
+-------------+-------------+-------------+
| 1           | 1           | 0           |
| 1           | 1           | 1           |
| 1           | 1           | 2           |
| 1           | 2           | 3           |
| 1           | 2           | 4           |
| 2           | 1           | 5           |
| 2           | 1           | 6           |
+-------------+-------------+-------------+
输出:
+-------------+-------------+
| actor_id    | director_id |
+-------------+-------------+
| 1           | 1           |
+-------------+-------------+
解释:
唯一的 id 对是 (1, 1),他们恰好合作了 3 次。

题源:leetcode

解答:

import pandas as pd

# 示例数据
data = {
    'actor_id': [1, 1, 1, 1, 1, 2, 2],
    'director_id': [1, 1, 1, 2, 2, 1, 1],
    'timestamp': [0, 1, 2, 3, 4, 5, 6]
}

# 创建DataFrame
df = pd. DataFrame(data)

# 分组并计算每组的出现次数
grouped = df.groupby(['actor_id', 'director_id']).size().reset_index(name = 'count')

# 过滤出出现次数大于或等于3次的组
result = grouped[['actor_id', 'director_id']][grouped['count'] >= 3]
print(result)

2024.6.16

06-16 17:00