问题描述
我有一个下面给出的数据集,其中 weekids
已编入索引:
I have a dataset given below with weekids
indexed:
product_name serial_number date sum
"A" "12" "202001" 150
"A" "12" "202002" 350
"A" "12" "202004" 550
"A" "12" "202005" 1500
这里,假设我当前的weekid是"202007"
,我需要将那些缺失的weekids转换为零,以获得给定的当前weekids "202007"
下面:
Here, assume that my current weekid is "202007"
, and I need to convert those missing weekids with zeros for a given current weekids "202007"
given below:
product_name serial_number date sum
"A" "12" "202001" 150
"A" "12" "202002" 350
"A" "12" "202003" 0
"A" "12" "202004" 550
"A" "12" "202005" 1500
"A" "12" "202006" 0
"A" "12" "202007" 0
你能帮助我如何在 python 中使用 Pandas 数据帧有效地将第一个数据集转换为第二个数据集吗?谢谢.
Can you please help me how to convert first dataset into second one efficiently in python using with pandas dataframes? Thanks.
推荐答案
您可以使用 complete 函数来自 pyjanitor 以公开缺少组合;目前你必须从 github 安装最新的开发版本:
You could use the complete function from pyjanitor to expose the missing combinations; at the moment you have to install the latest development version from github:
import pandas as pd
df = pd.DataFrame({'product_name': ['A', 'A', 'A', 'A'],
'serial_number': [12, 12, 12, 12],
'date': [202001, 202002, 202004, 202005],
'sum': [150, 350, 550, 1500]})
df
product_name serial_number date sum
0 A 12 202001 150
1 A 12 202002 350
2 A 12 202004 550
3 A 12 202005 1500
# install latest dev version
# pip install git+https://github.com/ericmjl/pyjanitor.git
import janitor
(
df.complete(columns = [{"date": lambda x: np.arange(202001, 202008)}],
fill_value={"sum": 0})
.ffill()
)
product_name serial_number date sum
0 A 12.0 202001 150.0
1 A 12.0 202002 350.0
2 A 12.0 202003 0.0
3 A 12.0 202004 550.0
4 A 12.0 202005 1500.0
5 A 12.0 202006 0.0
6 A 12.0 202007 0.0
仅使用 Pandas,您可以创建一个包含从 202001
到 202008
的所有值的系列并合并到现有数据框:
With Pandas only, you can create a series that has all the values from 202001
to 202008
and merge to existing dataframe:
complete_array = pd.Series(np.arange(202001, 202008), name="date")
complete_array
0 202001
1 202002
2 202003
3 202004
4 202005
5 202006
6 202007
Name: date, dtype: int64
(
df.merge(complete_array, how="outer", on="date")
.sort_values("date") # you can add ignore_index=True to reset the index
.fillna({"sum": 0})
.ffill()
)
product_name serial_number date sum
0 A 12.0 202001 150.0
1 A 12.0 202002 350.0
4 A 12.0 202003 0.0
2 A 12.0 202004 550.0
3 A 12.0 202005 1500.0
5 A 12.0 202006 0.0
6 A 12.0 202007 0.0
这篇关于如何在python数据帧中生成缺失的weekids的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!