问题描述
我正在关注此博客以识别我的时间序列数据中的季节性客户:
导入pandas作为pa将 numpy 导入为 np将 pyodbc 导入为 pycnxn = py.connect('DRIVER='+driver+';SERVER='+server+';PORT=1433;DATABASE='+database+';UID='+用户名+';PWD='+密码)original = pa.read_sql_query('SELECT s.customer_id, s.yr, s.mnth, Case when s.usage<0 then 0 else s.usage end as usage FROM dbo.Seasonal s Join ( Select Top 2000 customer_id, SUM(用法)作为用法来自 dbo.Seasonal where Yr!=2018 Group by customer_id ) t ON s.customer_id = t.customer_id Where yr!= 2018 Order by customer_id, yr, mnth', cnxn)grouped = original.groupby(by='customer_id')def yearmonth_to_justmonth(年,月):返回年 * 12 + 月 - 1def fillInForOwner(group):min = group.head(1).iloc[0]max = group.tail(1).iloc[0]minMonths = yearmonth_to_justmonth(min.yr, min.mnth)maxMonths = yearmonth_to_justmonth(max.yr, max.mnth)fill_index = pa.Index(np.arange(minMonths, maxMonths, 1), name="filled_months")group['months'] = group.yr * 12 + group.mnth - 1group = group.set_index('月')group = group.reindex(fill_index)group.customer_id = min.customer_idgroup.yr = group.index//12group.mnth = group.index % 12 + 1group.usage = np.where(group.usage.isnull(), 0, group.usage).astype(int)返回组fillIn = grouped.apply(fillInForOwner)newIndex = pa.Index(np.arange(fillIn.customer_id.count()))将 rpy2 导入为 r从 rpy2.robjects.packages 导入导入器从 rpy2.robjects 导入 r、pandas2ri、globalenvpandas2ri.activate()base = importr('base')colorspace = importr('colorspace')预测 = 进口商('预测')times = importr('timeSeries')统计=进口商('统计')输出文件 = 'results.csv'df_list = []对于 customerid,fillIn.groupby(by=['customer_id']) 中的 dataForCustomer:startYear = dataForCustomer.head(1).iloc[0].yrstartMonth = dataForCustomer.head(1).iloc[0].mnthendYear = dataForCustomer.tail(1).iloc[0].yrendMonth = dataForCustomer.tail(1).iloc[0].mnthcustomerTS = stats.ts(dataForCustomer.usage.astype(int),start=base.c(startYear,startMonth),end=base.c(endYear, endMonth),频率=12)r.assign('customerTS', customerTS)尝试:季节性 = r('''fit
回过头来回答我是如何让它工作的,只是将原始"数据帧传递到 for 循环中.我的数据已经有空的 $0 月份,所以我不需要运行那部分代码.谢谢大家的帮助
I am following this blog to identify seasonal customers in my time series data:https://www.kristenkehrer.com/seasonality-code
My code is shamelessly nearly identical to the blog, with some small tweaks, code is below. I am able to run the code entirely, for 2000 customers. A few hours later, 0 customers were flagged as seasonal in my results.
Manually looking at customers data over time, I do believe I have many examples of seasonal customers that should have been picked up. Below is a sample of the data I am using.
Am I missing something stupid? am I in way over my head to even try this, being very new to python?
Note that I am adding the "0 months" in my data source, but I don't think it would hurt anything for that function to check again. I'm also not including the data source credentials step.
Thank you
import pandas as pa
import numpy as np
import pyodbc as py
cnxn = py.connect('DRIVER='+driver+';SERVER='+server+';PORT=1433;DATABASE='+database+';UID='+username+';PWD='+ password)
original = pa.read_sql_query('SELECT s.customer_id, s.yr, s.mnth, Case when s.usage<0 then 0 else s.usage end as usage FROM dbo.Seasonal s Join ( Select Top 2000 customer_id, SUM(usage) as usage From dbo.Seasonal where Yr!=2018 Group by customer_id ) t ON s.customer_id = t.customer_id Where yr!= 2018 Order by customer_id, yr, mnth', cnxn)
grouped = original.groupby(by='customer_id')
def yearmonth_to_justmonth(year, month):
return year * 12 + month - 1
def fillInForOwner(group):
min = group.head(1).iloc[0]
max = group.tail(1).iloc[0]
minMonths = yearmonth_to_justmonth(min.yr, min.mnth)
maxMonths = yearmonth_to_justmonth(max.yr, max.mnth)
filled_index = pa.Index(np.arange(minMonths, maxMonths, 1), name="filled_months")
group['months'] = group.yr * 12 + group.mnth - 1
group = group.set_index('months')
group = group.reindex(filled_index)
group.customer_id = min.customer_id
group.yr = group.index // 12
group.mnth = group.index % 12 + 1
group.usage = np.where(group.usage.isnull(), 0, group.usage).astype(int)
return group
filledIn = grouped.apply(fillInForOwner)
newIndex = pa.Index(np.arange(filledIn.customer_id.count()))
import rpy2 as r
from rpy2.robjects.packages import importr
from rpy2.robjects import r, pandas2ri, globalenv
pandas2ri.activate()
base = importr('base')
colorspace = importr('colorspace')
forecast = importr('forecast')
times = importr('timeSeries')
stats = importr('stats')
outfile = 'results.csv'
df_list = []
for customerid, dataForCustomer in filledIn.groupby(by=['customer_id']):
startYear = dataForCustomer.head(1).iloc[0].yr
startMonth = dataForCustomer.head(1).iloc[0].mnth
endYear = dataForCustomer.tail(1).iloc[0].yr
endMonth = dataForCustomer.tail(1).iloc[0].mnth
customerTS = stats.ts(dataForCustomer.usage.astype(int),
start=base.c(startYear,startMonth),
end=base.c(endYear, endMonth),
frequency=12)
r.assign('customerTS', customerTS)
try:
seasonal = r('''
fit<-tbats(customerTS, seasonal.periods = 12,
use.parallel = TRUE)
fit$seasonal
''')
except:
seasonal = 1
df_list.append({'customer_id': customerid, 'seasonal': seasonal})
print(f' {customerid} | {seasonal} ')
seasonal_output = pa.DataFrame(df_list)
print(seasonal_output)
seasonal_output.to_csv(outfile)
Circling back to answer how I got this to work was by just passing the "original" dataframe into the for loop. My data already had the empty $0 months so I didn't need that part of the code to run. Thank you all for your help
这篇关于R tbats 模型季节性客户标记 无结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!