我对此是新手,正在寻求解决以下问题。我有一个数据集,其中包含基于一周中每天发生的多个关键字搜索的SKU搜索排名。
我希望能够在我的数据中添加一列,在其中我具有当天特定SKU搜索排名的median
(或mean
或std
)。
import csv
import numpy as np
import pandas as pd
from pandas import DataFrame
df = pd.read_csv('searchdata.csv')
dfdataIwant = df[['Date', 'SKU', 'Search_Rank']]
print dfdataIwant.groupby(['Date','SKU']).median()
这使我得到正在寻找的
median
值。但是,我要执行的操作是将该median
值插入到新列中。我要在此列中插入的median
值应与正确的日期和SKU相对应。 最佳答案
如果要将.transform()
聚合结果分配回原始的.grouby()
(see docs),则有DataFrame
:
df['median'] = data.groupby(['Date','SKU'])['Search Rank'].transform('median')
同样适用于您提到的其他统计信息。例:
df = pd.DataFrame(data={'rank': np.random.randint(low=1, high=100, size=500),
'SKU': np.random.choice(list('ABCDE'), replace=True, size=500),
'date': np.array([d for d in repeat(pd.date_range(start=date(2016,1,1), freq='D', periods=20), 25)]).flatten()})
df['median'] = df.groupby(['date','SKU'])['rank'].transform('median')
结果是:
df.sort_values(['date', 'SKU', 'rank'])
SKU date rank median
460 A 2016-01-01 4 66.0
80 A 2016-01-01 29 66.0
400 A 2016-01-01 38 66.0
220 A 2016-01-01 64 66.0
480 A 2016-01-01 68 66.0
160 A 2016-01-01 69 66.0
200 A 2016-01-01 70 66.0
360 A 2016-01-01 86 66.0
280 B 2016-01-01 14 22.0
300 B 2016-01-01 30 22.0
380 C 2016-01-01 35 63.0
240 C 2016-01-01 46 63.0
440 C 2016-01-01 63 63.0
20 C 2016-01-01 69 63.0
340 C 2016-01-01 91 63.0
100 D 2016-01-01 32 59.0
40 D 2016-01-01 38 59.0
120 D 2016-01-01 59 59.0
320 D 2016-01-01 77 59.0
260 D 2016-01-01 94 59.0
0 E 2016-01-01 31 60.0
420 E 2016-01-01 35 60.0
140 E 2016-01-01 60 60.0
60 E 2016-01-01 64 60.0
180 E 2016-01-01 99 60.0
441 A 2016-01-02 35 52.0
281 A 2016-01-02 52 52.0
481 A 2016-01-02 71 52.0
341 B 2016-01-02 73 88.0
81 B 2016-01-02 81 88.0
.. .. ... ... ...
418 D 2016-01-19 98 71.5
38 E 2016-01-19 50 54.0
458 E 2016-01-19 51 54.0
478 E 2016-01-19 57 54.0
18 E 2016-01-19 71 54.0
439 A 2016-01-20 9 45.0
499 A 2016-01-20 45 45.0
99 A 2016-01-20 63 45.0
279 B 2016-01-20 12 55.5
339 B 2016-01-20 29 55.5
459 B 2016-01-20 44 55.5
379 B 2016-01-20 53 55.5
319 B 2016-01-20 58 55.5
39 B 2016-01-20 84 55.5
299 B 2016-01-20 94 55.5
119 B 2016-01-20 98 55.5
199 C 2016-01-20 15 43.0
159 C 2016-01-20 43 43.0
479 C 2016-01-20 90 43.0
259 D 2016-01-20 12 33.0
419 D 2016-01-20 13 33.0
59 D 2016-01-20 15 33.0
139 D 2016-01-20 31 33.0
79 D 2016-01-20 33 33.0
19 D 2016-01-20 42 33.0
239 D 2016-01-20 46 33.0
399 D 2016-01-20 54 33.0
219 D 2016-01-20 63 33.0
179 E 2016-01-20 27 53.5
359 E 2016-01-20 80 53.5
关于python - 根据多个标识符查找值的中位数并添加到行,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/37244169/