问题描述
当然,您可以将剩余文件大小除以当前下载速度,但是如果您的下载速度波动(并且会波动),这不会产生很好的结果.什么是产生更平滑倒计时的更好算法?
Sure you could divide the remaining file size by the current download speed, but if your download speed fluctuates (and it will), this doesn't produce a very nice result. What's a better algorithm for producing smoother countdowns?
推荐答案
我多年前编写了一个算法来预测磁盘映像和多播程序中的剩余时间,当当前吞吐量超出预定义范围.它会让事情保持平稳,除非发生剧烈的事情,然后它会迅速调整,然后再次回到移动平均线.在此处查看示例图表:
I wrote an algorithm years ago to predict time remaining in a disk imaging and multicasting program that used a moving average with a reset when the current throughput went outside of a predefined range. It would keep things smooth unless something drastic happened, then it would adjust quickly and then return to a moving average again. See example chart here:
该示例图表中的粗蓝线是一段时间内的实际吞吐量.请注意传输的前半部分吞吐量较低,然后在后半部分急剧上升.橙色线是总体平均值.请注意,它永远不会调整得足够远,无法准确预测完成所需的时间.灰线是移动平均值(即最后 N 个数据点的平均值 - 在此图中 N 为 5,但实际上,N 可能需要更大才能足够平滑).它恢复得更快,但仍然需要一段时间来调整.N越大,需要的时间越长.因此,如果您的数据非常嘈杂,则 N 必须更大,恢复时间也会更长.
The thick blue line in that example chart is the actual throughput over time. Notice the low throughput during the first half of the transfer and then it jumps up dramatically in the second half. The orange line is an overall average. Notice that it never adjusts up far enough to ever give an accurate prediction of how long it will take to finish. The gray line is a moving average (i.e. the average of the last N data points - in this graph N is 5, but in reality, N might need to be larger to smooth enough). It recovers more quickly, but still takes a while to adjust. It will take more time the larger N is. So if your data is pretty noisy, then N will have to be larger and the recovery time will be longer.
绿线是我使用的算法.它就像移动平均线一样,但是当数据超出预定范围(由浅蓝色和黄色细线指定)时,它会重置移动平均线并立即跳升.预定义范围也可以基于标准偏差,因此它可以自动调整数据的嘈杂程度.我只是将这些值放入 Excel 以绘制它们以用于此答案,因此它并不完美,但您明白了.
The green line is the algorithm I used. It goes along just like a moving average, but when the data moves outside a predefined range (designated by the light thin blue and yellow lines), it resets the moving average and jumps up immediately. The predefined range can also be based on standard deviation so it can adjust to how noisy the data is automatically. I just threw these values into Excel to diagram them for this answer so it's not perfect, but you get the idea.
虽然可以设计数据,使该算法无法很好地预测剩余时间.最重要的是,您需要对数据的行为方式有一个大致的了解,并相应地选择算法.我的算法对我看到的数据集运行良好,所以我们一直在使用它.
Data could be contrived to make this algorithm fail to be a good predictor of time remaining though. The bottom line is that you need to have a general idea of how you expect the data to behave and pick an algorithm accordingly. My algorithm worked well for the data sets I was seeing, so we kept using it.
另一个重要提示是,开发人员通常会忽略进度条和时间估计计算中的设置和拆卸时间.这会导致永恒的 99% 或 100% 进度条长时间停留在那里(当缓存被刷新或其他清理工作正在发生时)或在目录扫描或其他设置工作发生时疯狂的早期估计,累积时间但没有累积任何百分比的进步,这让一切都变得糟糕.您可以运行多个测试,包括设置和拆卸时间,并估计这些时间的平均时间或基于作业的大小,并将该时间添加到进度条中.例如,前 5% 的工作是设置工作,最后 10% 是拆卸工作,然后中间的 85% 是下载或您跟踪的任何重复过程.这也有很大帮助.
One other important tip is that usually developers ignore setup and teardown times in their progress bars and time estimate calculations. This results in the eternal 99% or 100% progress bar that just sits there for a long time (while caches are being flushed or other cleanup work is happening) or wild early estimates when the scanning of directories or other setup work happens, accruing time but not accruing any percentage progress, which throws everything off. You can run several tests that include the setup and teardown times and come up with an estimate of how long those times are on average or based on the size of the job and add that time to the progress bar. For example, the first 5% of work is setup work and the last 10% is teardown work and then the 85% in the middle is the download or whatever repeating process your tracking is. This can help a lot too.
这篇关于如何估计剩余下载时间(准确)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!