问题描述
我正在尝试从我的数据中获取最相关的价值.我想出使用 Pandas 中的 max 和 min 函数来获取最旧和最近的日期,但我找不到其余的修复程序.我试图从我的数据集中取出一家独特的公司和产品,并根据这些点获取其余的数据.如果有人能告诉我 Python 中用于解决此类问题的工具集,或有关如何在 Python 中解决此类问题的指导,那将非常有帮助.
- 对于security_level,superservere>severe >中等 >材料 >未成年
- 对于评级,如果我们看到同一家公司和产品同时获得真实和虚假,则为真实
- 对于 rating_level,关键 >高 >中 >低
- 对于 first_level,是最早的日期,对于 last_release,是最近的日期
- score,同类产品和公司中的最高分
公司 | 产品 | security_level | 评分 | rating_level | first_release | last_release | 得分 |
---|---|---|---|---|---|---|---|
谷歌 | 手机 | 次要 | 真的 | 关键 | 04/11/2020 | 03/17/2020 | 0.5 |
谷歌 | 操作系统 | 中等 | 错误 | 中等 | 09/05/2019 | 03/20/2021 | 0.009 |
谷歌 | 操作系统 | 次要 | 错误 | 低 | 09/04/2019 | 05/11/2020 | 19 |
谷歌 | 电视 | 严重 | 真的 | 高 | 08/11/2020 | 03/04/2021 | |
谷歌 | 手机 | 超级严重 | 错误 | 中等 | 04/06/2015 | 08/19/2020 | 2.4 |
谷歌 | 手机 | 次要 | 错误 | 高 | 08/08/2019 | 08/19/2020 | 1.3 |
苹果 | iphone | 次要 | 真的 | 低 | 02/03/2020 | 10/13/2020 | 3 |
苹果 | iphone | 材料 | 真的 | 中等 | 01/21/2018 | 03/04/2021 | 6 |
苹果 | iwatch | 材料 | 错误 | 低 | 04/11/2015 | 08/13/2020 | 8 |
苹果 | iphone | 材料 | 真的 | 中等 | 10/20/2020 | 03/19/2021 | 5 |
戴尔 | 笔记本电脑 | 次要 | 错误 | 低 | 01/05/2021 | 03/20/2021 | 1 |
输出:
公司 | 产品 | security_level | 评分 | rating_level | first_release | last_release | 得分 |
---|---|---|---|---|---|---|---|
谷歌 | 手机 | 超级严重 | 真的 | 关键 | 04/06/2015 | 08/19/2020 | 2.4 |
谷歌 | 操作系统 | 中等 | 错误 | 中等 | 09/04/2019 | 03/20/2021 | 19 |
谷歌 | 电视 | 严重 | 真的 | 高 | 08/11/2020 | 03/04/2021 | |
苹果 | iphone | 材料 | 真的 | 中等 | 01/21/2018 | 03/19/2021 | 6 |
苹果 | iwatch | 材料 | 错误 | 低 | 04/11/2015 | 08/13/2020 | 8 |
戴尔 | 笔记本电脑 | 次要 | 错误 | 低 | 01/05/2021 | 03/20/2021 | 1 |
更改 first_release
和 last_release
列的 dtype
到 datetime
df['last_release'] = pd.to_datetime(df['last_release'])df['first_release'] = pd.to_datetime(df['first_release'])
将 security_level
和 rating_level
列转换为 ordered
categorical
类型
df['rating_level'] = pd.Categorical(df['rating_level'], ['low', 'medium', 'high', 'critical'],ordered=True)df['security_level'] = pd.Categorical(df['security_level'], ['minor', 'material', 'moderate', 'severe', 'supersevere'],ordered=True)
Group
列company
和product
上的数据框,并使用agg_dict中指定的相应聚合函数聚合剩余的列代码>
agg_dict = {'security_level': 'max', 'rating': 'max', 'rating_level': 'max','first_release': 'min', 'last_release': 'max', 'score': 'max'}out = df.groupby(['company', 'product'], as_index=False, sort=False).agg(agg_dict)
结果
>>>出去公司产品安全_等级评级 rating_level first_release last_release score0 google mobile supersevere 真正的关键 2015-04-06 2020-08-19 2.41 谷歌操作系统中等 虚假中等 2019-09-04 2021-03-20 19.02 谷歌电视严重 真高 2020-08-11 2021-03-04 NaN3 苹果 iphone 材质 True medium 2018-01-21 2021-03-19 6.04 苹果 iwatch 材料假低 2015-04-11 2020-08-13 8.05 戴尔笔记本电脑轻微假低 2021-01-05 2021-03-20 1.0I'm trying to take the most relevant value from my data here. I figured out to take the oldest and the most recent dates using max and min function in pandas but i couldn't find the fix to the rest. I'm trying to take one unqiue company and product from my data set and get the rest of their data based on these points. If anyone could tell me the toolsets used in python to address such issues that will be great or guidance on how such issues are addressed in python, that would be very helpful.
- for security_level, superservere>severe > moderate > material > minor
- for rating, take true if we see the same company and product got both true and flase
- for rating_level, critical > high > medium > low
- for first_level, the oldest date and for last_release, the most recent date
- score, the highest score amongst the same prodcut and company
mobile | minor | TRUE | critical | 04/11/2020 | 03/17/2020 | 0.5 | |
os | moderate | FALSE | medium | 09/05/2019 | 03/20/2021 | 0.009 | |
os | minor | FALSE | low | 09/04/2019 | 05/11/2020 | 19 | |
tv | severe | TRUE | high | 08/11/2020 | 03/04/2021 | ||
mobile | supersevere | FALSE | medium | 04/06/2015 | 08/19/2020 | 2.4 | |
mobile | minor | FALSE | high | 08/08/2019 | 08/19/2020 | 1.3 | |
apple | iphone | minor | TRUE | low | 02/03/2020 | 10/13/2020 | 3 |
apple | iphone | material | TRUE | medium | 01/21/2018 | 03/04/2021 | 6 |
apple | iwatch | material | FALSE | low | 04/11/2015 | 08/13/2020 | 8 |
apple | iphone | material | TRUE | medium | 10/20/2020 | 03/19/2021 | 5 |
dell | laptop | minor | FALSE | low | 01/05/2021 | 03/20/2021 | 1 |
Output:
mobile | supersevere | TRUE | critical | 04/06/2015 | 08/19/2020 | 2.4 | |
os | moderate | FALSE | medium | 09/04/2019 | 03/20/2021 | 19 | |
tv | severe | TRUE | high | 08/11/2020 | 03/04/2021 | ||
apple | iphone | material | TRUE | medium | 01/21/2018 | 03/19/2021 | 6 |
apple | iwatch | material | FALSE | low | 04/11/2015 | 08/13/2020 | 8 |
dell | laptop | minor | FALSE | low | 01/05/2021 | 03/20/2021 | 1 |
Change the dtype
of first_release
and last_release
columns to datetime
df['last_release'] = pd.to_datetime(df['last_release'])
df['first_release'] = pd.to_datetime(df['first_release'])
Convert the columns security_level
and rating_level
to ordered
categorical
type
df['rating_level'] = pd.Categorical(df['rating_level'], ['low', 'medium', 'high', 'critical'], ordered=True)
df['security_level'] = pd.Categorical(df['security_level'], ['minor', 'material', 'moderate', 'severe', 'supersevere'], ordered=True)
Group
the dataframe on columns company
and product
and aggregate the remaining columns with the corresponding aggregation functions specified in agg_dict
agg_dict = {'security_level': 'max', 'rating': 'max', 'rating_level': 'max',
'first_release': 'min', 'last_release': 'max', 'score': 'max'}
out = df.groupby(['company', 'product'], as_index=False, sort=False).agg(agg_dict)
Result
>>> out
company product security_level rating rating_level first_release last_release score
0 google mobile supersevere True critical 2015-04-06 2020-08-19 2.4
1 google os moderate False medium 2019-09-04 2021-03-20 19.0
2 google tv severe True high 2020-08-11 2021-03-04 NaN
3 apple iphone material True medium 2018-01-21 2021-03-19 6.0
4 apple iwatch material False low 2015-04-11 2020-08-13 8.0
5 dell laptop minor False low 2021-01-05 2021-03-20 1.0
这篇关于排序/选择唯一的和最新的数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!