是否有一个函数可以获取 pandas 数据帧时间序列上两个值之间的差异?

本文介绍了是否有一个函数可以获取 pandas 数据帧时间序列上两个值之间的差异?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在

所需的示例输出:

  pd.DataFrame(data = {'state':[['Alabama'，'Alabama'，'Alabama'，'Alabama'，'Alabama']，'日期':[日期日期(2020,3,13)，日期日期(2020,3,14)，日期日期(2020,3,15)，日期日期(2020,3,16)，日期.date(2020,3,17)]，'new_covid_cases':[np.nan，0.8,0.9,0.7,0.3]})

从原始NYT数据集中重新创建示例数据:

  df = pd.read_csv('https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv',parse_dates= ['date'])df.groupby(['state'，'date'])[['cases']].mean().reset_index()

任何帮助将不胜感激！想要学习如何手动/通过功能来执行此操作，而不是查找新情况".数据集，因为在不久的将来我将大量使用时间序列.

解决方案

diff函数正确，但是如果您查看错误消息:

 'DatetimeIndexResampler'对象没有属性'diff'

在您第一个尝试过的方法中，这是因为diff是适用于DataFrames的功能，而不适用于Resamplers，因此请通过指定要对其进行重新采样的方式将其转换回DataFrame.

如果您每天都有COVID案件的总数，并且希望将其重新抽样到2天，则您可能只想保留两天内的最新更新，在这种情况下，例如 df.resample('2d').last().diff()应该可以工作.

I am messing around in the NYT covid dataset which has total covid cases for each county, per day.

I would like to find out the difference of cases between each day, so theoretically I could get the number of new cases per day instead of total cases. Taking a rolling mean, or resampling every 2 days using a mean/sum/etc all work just fine. It's just subtracting that is giving me such a headache.

Tried methods:

df.resample('2d').diff()
df.resample('1d').agg(np.subtract)
df.rolling(2).diff()
df.rolling('2').agg(np.subtract)

Sample data:

pd.DataFrame(data={'state':['Alabama','Alabama','Alabama','Alabama','Alabama'],
               'date':[dt.date(2020,3,13),dt.date(2020,3,14),dt.date(2020,3,15),dt.date(2020,3,16),dt.date(2020,3,17)],
               'covid_cases':[1.2,2.0,2.9,3.6,3.9]
              })

Desired sample output:

pd.DataFrame(data={'state':['Alabama','Alabama','Alabama','Alabama','Alabama'],
               'date':[dt.date(2020,3,13),dt.date(2020,3,14),dt.date(2020,3,15),dt.date(2020,3,16),dt.date(2020,3,17)],
               'new_covid_cases':[np.nan,0.8,0.9,0.7,0.3]
              })

Recreate sample data from original NYT dataset:

df = pd.read_csv('https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv',parse_dates=['date'])
df.groupby(['state','date'])[['cases']].mean().reset_index()

Any help would be greatly appreciated! Would like to learn how to do this manually/via function rather than finding a "new cases" dataset as I will be working with timeseries a lot in the very near future.

解决方案

The diff function is correct, but if you look at your error message:

'DatetimeIndexResampler' object has no attribute 'diff'

in your first tried methods, it's because diff is a function available for DataFrames, not for Resamplers, so turn it back into a DataFrame by specifying how you want to resample it.

If you have the total number of COVID cases for each day and want to resample it to 2 days, you probably only want to keep the latest update out of the two days, in which case something like df.resample('2d').last().diff() should work.

这篇关于是否有一个函数可以获取 pandas 数据帧时间序列上两个值之间的差异?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！

Pandas

是否有一个函数可以获取 pandas 数据帧时间序列上两个值之间的差异?

问题描述