一、每日一题
+-------------+---------+ | Column Name | Type | +-------------+---------+ | actor_id | int | | director_id | int | | timestamp | int | +-------------+---------+ timestamp 是这张表的主键(具有唯一值的列).
编写解决方案找出合作过至少三次的演员和导演的 id 对 (actor_id, director_id)
示例 1:
输入: ActorDirector 表: +-------------+-------------+-------------+ | actor_id | director_id | timestamp | +-------------+-------------+-------------+ | 1 | 1 | 0 | | 1 | 1 | 1 | | 1 | 1 | 2 | | 1 | 2 | 3 | | 1 | 2 | 4 | | 2 | 1 | 5 | | 2 | 1 | 6 | +-------------+-------------+-------------+ 输出: +-------------+-------------+ | actor_id | director_id | +-------------+-------------+ | 1 | 1 | +-------------+-------------+ 解释: 唯一的 id 对是 (1, 1),他们恰好合作了 3 次。
题源:leetcode
解答:
import pandas as pd
# 示例数据
data = {
'actor_id': [1, 1, 1, 1, 1, 2, 2],
'director_id': [1, 1, 1, 2, 2, 1, 1],
'timestamp': [0, 1, 2, 3, 4, 5, 6]
}
# 创建DataFrame
df = pd. DataFrame(data)
# 分组并计算每组的出现次数
grouped = df.groupby(['actor_id', 'director_id']).size().reset_index(name = 'count')
# 过滤出出现次数大于或等于3次的组
result = grouped[['actor_id', 'director_id']][grouped['count'] >= 3]
print(result)
2024.6.16