问题描述
我有一个熊猫数据框,其中包含信息:
索引年月日符号交易nr_shares
2011- 01-10 2011 1 10 AAPL购买1500
2011-01-13 2011 1 13 GOOG卖1000
,我想填写第二个零填充的熊猫数据框
index AAPL GOOG
/ pre>
2011- 01-10 0 0
2011-01-11 0 0
2011-01-12 0 0
2011-01-13 0 0
使用第一个数据框中的信息,以便获得
索引AAPL GOOG
2011-01-10 1500 0
2011-01-11 0 0
2011-01-12 0 0
2011-01-13 0 -1000
可以看出,在相关日期,指定数量的股票的买卖交易有已经输入了适当的列,正数为a购买和负数的卖单。
我该如何完成这个?我必须循环使用第一个数据帧索引,并使用嵌套的if语句检查符号和事务列,然后写入第二个数据帧,还是有一个更优雅的数据框方法,我可以使用?
解决方案您可以使用
pivot_table
。从(编辑稍微复杂一点):>>> df1
索引年月日符号交易nr_shares
0 2011-01-10 2011 1 10 AAPL购买1500
1 2011-01-10 2011 1 10 AAPL卖200
2 2011 -01-10 2011 1 10 GOOG卖500
3 2011-01-10 2011 1 10 GOOG买600
4 2011-01-13 2011 1 13 GOOG卖1000
>> > df2
index AAPL GOOG
0 2011-01-10 0 0
1 2011-01-11 0 0
2 2011-01-12 0 0
3 2011 -01-13 0 0
我们可以签署股票:
>>> df1 [nr_shares] = df1.apply(lambda row:row [nr_shares] *(-1 if row [transaction] ==Sellelse 1),axis = 1)
> >> df1
索引年月日符号交易nr_shares
0 2011-01-10 2011 1 10 AAPL购买1500
1 2011-01-10 2011 1 10 AAPL卖-200
2 2011-01-10 2011 1 10 GOOG卖-500
3 2011-01-10 2011 1 10 GOOG买600
4 2011-01-13 2011 1 13 GOOG卖-1000
然后,您可以转动
df1
。默认情况下,它使用聚合值的平均值,但是我们需要总和:>>> a = df1.pivot_table(values =nr_shares,rows =index,cols =symbol,
aggfunc = sum)
>>> a
符号AAPL GOOG
索引
2011-01-10 1300 100
2011-01-13 NaN -1000
给
b
相同的索引:>>> b = df2.set_index(index)
>>> b
AAPL GOOG
索引
2011-01-10 0 0
2011-01-11 0 0
2011-01-12 0 0
2011 -01-13 0 0
然后添加:
>>> (a + b).fillna(0)
符号AAPL GOOG
索引
2011-01-10 1300 100
2011-01-11 0 0
2011- 01-12 0 0
2011-01-13 0 -1000
I have one Pandas dataframe that contains information thus:
index year month day symbol transaction nr_shares 2011-01-10 2011 1 10 AAPL Buy 1500 2011-01-13 2011 1 13 GOOG Sell 1000
and I would like to fill a second, zero-filled Pandas dataframe
index AAPL GOOG 2011-01-10 0 0 2011-01-11 0 0 2011-01-12 0 0 2011-01-13 0 0
using the information from the first dataframe so I get
index AAPL GOOG 2011-01-10 1500 0 2011-01-11 0 0 2011-01-12 0 0 2011-01-13 0 -1000
where it can be seen that on the relevant dates the buy and sell transactions for a specified number of shares have been entered in the appropriate column, with a positive number for a buy and a negative number for a sell order.
How can I accomplish this? Will I have to loop over the first dataframe index and check the symbol and transaction columns using nested "if" statements and then write to the second dataframe, or is there a more elegant dataframe method that I could use?
解决方案You could use
pivot_table
. Starting from (edited to be slightly more complicated):>>> df1 index year month day symbol transaction nr_shares 0 2011-01-10 2011 1 10 AAPL Buy 1500 1 2011-01-10 2011 1 10 AAPL Sell 200 2 2011-01-10 2011 1 10 GOOG Sell 500 3 2011-01-10 2011 1 10 GOOG Buy 600 4 2011-01-13 2011 1 13 GOOG Sell 1000 >>> df2 index AAPL GOOG 0 2011-01-10 0 0 1 2011-01-11 0 0 2 2011-01-12 0 0 3 2011-01-13 0 0
We can sign the shares:
>>> df1["nr_shares"] = df1.apply(lambda row: row["nr_shares"] * (-1 if row["transaction"] == "Sell" else 1), axis=1) >>> df1 index year month day symbol transaction nr_shares 0 2011-01-10 2011 1 10 AAPL Buy 1500 1 2011-01-10 2011 1 10 AAPL Sell -200 2 2011-01-10 2011 1 10 GOOG Sell -500 3 2011-01-10 2011 1 10 GOOG Buy 600 4 2011-01-13 2011 1 13 GOOG Sell -1000
And then you can pivot
df1
. By default it uses the mean of the aggregated values, but we want the sum:>>> a = df1.pivot_table(values="nr_shares", rows="index", cols="symbol", aggfunc=sum) >>> a symbol AAPL GOOG index 2011-01-10 1300 100 2011-01-13 NaN -1000
Give
b
the same index:>>> b = df2.set_index("index") >>> b AAPL GOOG index 2011-01-10 0 0 2011-01-11 0 0 2011-01-12 0 0 2011-01-13 0 0
And then add them:
>>> (a+b).fillna(0) symbol AAPL GOOG index 2011-01-10 1300 100 2011-01-11 0 0 2011-01-12 0 0 2011-01-13 0 -1000
这篇关于使用来自另一个 pandas 数据帧的信息填充 pandas 数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!