问题描述
在分析能源需求和消耗数据时,我在重新采样和插值时间序列趋势数据时遇到问题.
In analysis of energy demand and consumption data, I'm having issue re-sampling and interpolating time series trended data.
数据集示例:
timestamp value kWh
------------------ ---------
12/19/2011 5:43:21 PM 79178
12/19/2011 5:58:21 PM 79179.88
12/19/2011 6:13:21 PM 79182.13
12/19/2011 6:28:21 PM 79183.88
12/19/2011 6:43:21 PM 79185.63
根据这些观察结果,我想要一些聚合以基于一段时间汇总值,并将频率设置为一个时间单位.
Based upon these observations, I'd like some aggregation to roll-up values based upon a period of time, with that frequency set to a unit of time.
例如,每小时的间隔填补缺失数据的任何空白
As in, intervals on the hour filling any gaps of missing data
timestamp value (approx)
------------------ ---------
12/19/2011 5:00:00 PM 79173
12/19/2011 6:00:00 PM 79179
12/19/2011 7:00:00 PM 79186
对于线性算法,似乎我会取时间差并将值与该因子相乘.
For a linear algorithm, it seems I would take the difference in time and multiply the value against that factor.
TimeSpan ts = current - previous;
Double factor = ts.TotalMinutes / period;
值和时间戳可以根据因子计算.
Value and timestamp could be calculated based upon the factor.
有了如此多的可用信息,我不确定为什么很难找到最优雅的方法.
With such quantity of available information, I'm unsure why it's difficult to find the most elegant approach to this.
也许首先,有没有可以推荐的开源分析库?
Perhaps first, are there open source analysis libraries that could be recommended?
对程序化方法有什么建议吗?理想情况下是 C#,还是可能使用 SQL?
Any recommendations for a programmatic approach? Ideally C#, or possibly with SQL?
或者,我可以指出任何类似的问题(有答案)?
Or, any similar questions (with answers) I could be pointed to?
推荐答案
通过使用内部用于表示 DateTimes 的时间刻度,您可以获得尽可能准确的值.由于这些时间刻度不会在午夜从零重新开始,因此您不会在日期边界处遇到问题.
By using the time-ticks that are used internally to represent DateTimes, you get the most accurate values that are possible. Since these time ticks do not restart at zero at midnight, you will not have problems at day boundaries.
// Sample times and full hour
DateTime lastSampleTimeBeforeFullHour = new DateTime(2011, 12, 19, 17, 58, 21);
DateTime firstSampleTimeAfterFullHour = new DateTime(2011, 12, 19, 18, 13, 21);
DateTime fullHour = new DateTime(2011, 12, 19, 18, 00, 00);
// Times as ticks (most accurate time unit)
long t0 = lastSampleTimeBeforeFullHour.Ticks;
long t1 = firstSampleTimeAfterFullHour.Ticks;
long tf = fullHour.Ticks;
// Energy samples
double e0 = 79179.88; // kWh before full hour
double e1 = 79182.13; // kWh after full hour
double ef; // interpolated energy at full hour
ef = e0 + (tf - t0) * (e1 - e0) / (t1 - t0); // ==> 79180.1275 kWh
公式说明
在几何学中,相似三角形是形状相同但大小不同的三角形.上面的公式是基于一个三角形中任意两条边的比率对于相似三角形的对应边相同的事实.
Explanation of the formula
In geometry, similar triangles are triangles that have the same shape but different sizes. The formula above is based on the fact that the ratios of any two sides in one triangle are the same for the corresponding sides of a similar triangle.
如果你有一个三角形 A B C 和一个相似的三角形 a b c,那么 A : B = a : b
.两个比率相等称为比例.
If you have a triangle A B C and a similar triangle a b c, then A : B = a : b
. The equality of two ratios is called a proportion.
我们可以将此比例规则应用于我们的问题:
We can apply this proportionality rule to our problem:
(e1 – e0) / (t1 – t0) = (ef – e0) / (tf – t0)
--- large triangle -- --- small triangle --
这篇关于TimeSeries 趋势数据的重采样、聚合和插值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!