本文介绍了如何在Julia`DataFrame`中使用重复的时间戳折叠数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个DataFrame
对象,如下所示:
I have a DataFrame
object looking as follows:
| Row | timestamp | price | volume |
|-----|---------------------|-------|--------|
| 1 | 2011-08-14T14:14:40 | 10.40 | 0.779 |
| 2 | 2011-08-14T15:15:17 | 10.40 | 0.101 |
| 3 | 2011-08-14T15:15:17 | 10.40 | 0.316 |
| ... | ................... | ..... | ..... |
timestamps
是唯一的,因此在解决此问题之前我无法转换为TimeArray
.如何使用价格的平均值和数量的总和折叠重复的timestamps
?
The timestamps
are non-unique, so I cannot convert to a TimeArray
before resolving this. How can I collapse duplicate timestamps
, taking the mean of the prices and the sum of the volumes?
谢谢您的指导!
推荐答案
您可以使用通过:
df = DataFrame(
cat = ["a", "b", "c","a"],
prices = [1,2,3,4],
vol = [10,20,30,40],
)
df2 = by(df, :cat) do sub
t = DataFrame(prices=mean(sub[:prices]), vol=sum(sub[:vol]))
end
df2
3×3 DataFrames.DataFrame
│ Row │ cat │ prices │ vol │
├─────┼─────┼────────┼─────┤
│ 1 │ "a" │ 2.5 │ 50 │
│ 2 │ "b" │ 2.0 │ 20 │
│ 3 │ "c" │ 3.0 │ 30 │
如果您必须按日/月/日等进行总计,您可能还会对.
If you have to make some totals by day/months/etc you may be interested also in this so answer.
这篇关于如何在Julia`DataFrame`中使用重复的时间戳折叠数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!