如何在python数据帧中生成缺失的weekids

如何在python数据帧中生成缺失的weekids

本文介绍了如何在python数据帧中生成缺失的weekids的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个下面给出的数据集,其中 weekids 已编入索引:

I have a dataset given below with weekids indexed:

product_name    serial_number     date           sum
"A"             "12"              "202001"   150
"A"             "12"              "202002"   350
"A"             "12"              "202004"   550
"A"             "12"              "202005"   1500

这里,假设我当前的weekid是"202007",我需要将那些缺失的weekids转换为零,以获得给定的当前weekids "202007"下面:

Here, assume that my current weekid is "202007", and I need to convert those missing weekids with zeros for a given current weekids "202007" given below:

product_name    serial_number     date           sum
"A"             "12"              "202001"   150
"A"             "12"              "202002"   350
"A"             "12"              "202003"   0
"A"             "12"              "202004"   550
"A"             "12"              "202005"   1500
"A"             "12"              "202006"   0
"A"             "12"              "202007"   0

你能帮助我如何在 python 中使用 Pandas 数据帧有效地将第一个数据集转换为第二个数据集吗?谢谢.

Can you please help me how to convert first dataset into second one efficiently in python using with pandas dataframes? Thanks.

推荐答案

您可以使用 complete 函数来自 pyjanitor 以公开缺少组合;目前你必须从 github 安装最新的开发版本:

You could use the complete function from pyjanitor to expose the missing combinations; at the moment you have to install the latest development version from github:

import pandas as pd

df = pd.DataFrame({'product_name': ['A', 'A', 'A', 'A'],
 'serial_number': [12, 12, 12, 12],
 'date': [202001, 202002, 202004, 202005],
 'sum': [150, 350, 550, 1500]})

df

  product_name  serial_number   date    sum
0   A           12              202001  150
1   A           12              202002  350
2   A           12              202004  550
3   A           12              202005  1500


# install latest dev version
# pip install git+https://github.com/ericmjl/pyjanitor.git

import janitor
(
    df.complete(columns = [{"date": lambda x: np.arange(202001, 202008)}],
                fill_value={"sum": 0})
      .ffill()
)

product_name    serial_number   date    sum
0   A   12.0    202001  150.0
1   A   12.0    202002  350.0
2   A   12.0    202003  0.0
3   A   12.0    202004  550.0
4   A   12.0    202005  1500.0
5   A   12.0    202006  0.0
6   A   12.0    202007  0.0

仅使用 Pandas,您可以创建一个包含从 202001202008 的所有值的系列并合并到现有数据框:

With Pandas only, you can create a series that has all the values from 202001 to 202008 and merge to existing dataframe:

complete_array = pd.Series(np.arange(202001, 202008), name="date")
complete_array

0    202001
1    202002
2    202003
3    202004
4    202005
5    202006
6    202007
Name: date, dtype: int64

(
    df.merge(complete_array, how="outer", on="date")
    .sort_values("date") # you can add ignore_index=True to reset the index
    .fillna({"sum": 0})
    .ffill()
)

    product_name    serial_number   date    sum
0          A            12.0     202001     150.0
1          A            12.0     202002     350.0
4          A            12.0     202003     0.0
2          A            12.0     202004     550.0
3          A            12.0     202005     1500.0
5          A            12.0     202006     0.0
6          A            12.0     202007     0.0

这篇关于如何在python数据帧中生成缺失的weekids的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-03 23:51