我对此是新手,正在寻求解决以下问题。我有一个数据集,其中包含基于一周中每天发生的多个关键字搜索的SKU搜索排名。

我希望能够在我的数据中添加一列,在其中我具有当天特定SKU搜索排名的median(或meanstd)。

import csv
import numpy as np
import pandas as pd
from pandas import DataFrame

df = pd.read_csv('searchdata.csv')

dfdataIwant = df[['Date', 'SKU', 'Search_Rank']]
print dfdataIwant.groupby(['Date','SKU']).median()


这使我得到正在寻找的median值。但是,我要执行的操作是将该median值插入到新列中。我要在此列中插入的median值应与正确的日期和SKU相对应。

最佳答案

如果要将.transform()聚合结果分配回原始的.grouby()see docs),则有DataFrame

df['median'] = data.groupby(['Date','SKU'])['Search Rank'].transform('median')


同样适用于您提到的其他统计信息。例:

df = pd.DataFrame(data={'rank': np.random.randint(low=1, high=100, size=500),
                        'SKU': np.random.choice(list('ABCDE'), replace=True, size=500),
                        'date': np.array([d for d in repeat(pd.date_range(start=date(2016,1,1), freq='D', periods=20), 25)]).flatten()})

df['median'] = df.groupby(['date','SKU'])['rank'].transform('median')


结果是:

df.sort_values(['date', 'SKU', 'rank'])

    SKU       date  rank  median
460   A 2016-01-01     4    66.0
80    A 2016-01-01    29    66.0
400   A 2016-01-01    38    66.0
220   A 2016-01-01    64    66.0
480   A 2016-01-01    68    66.0
160   A 2016-01-01    69    66.0
200   A 2016-01-01    70    66.0
360   A 2016-01-01    86    66.0
280   B 2016-01-01    14    22.0
300   B 2016-01-01    30    22.0
380   C 2016-01-01    35    63.0
240   C 2016-01-01    46    63.0
440   C 2016-01-01    63    63.0
20    C 2016-01-01    69    63.0
340   C 2016-01-01    91    63.0
100   D 2016-01-01    32    59.0
40    D 2016-01-01    38    59.0
120   D 2016-01-01    59    59.0
320   D 2016-01-01    77    59.0
260   D 2016-01-01    94    59.0
0     E 2016-01-01    31    60.0
420   E 2016-01-01    35    60.0
140   E 2016-01-01    60    60.0
60    E 2016-01-01    64    60.0
180   E 2016-01-01    99    60.0
441   A 2016-01-02    35    52.0
281   A 2016-01-02    52    52.0
481   A 2016-01-02    71    52.0
341   B 2016-01-02    73    88.0
81    B 2016-01-02    81    88.0
..   ..        ...   ...     ...
418   D 2016-01-19    98    71.5
38    E 2016-01-19    50    54.0
458   E 2016-01-19    51    54.0
478   E 2016-01-19    57    54.0
18    E 2016-01-19    71    54.0
439   A 2016-01-20     9    45.0
499   A 2016-01-20    45    45.0
99    A 2016-01-20    63    45.0
279   B 2016-01-20    12    55.5
339   B 2016-01-20    29    55.5
459   B 2016-01-20    44    55.5
379   B 2016-01-20    53    55.5
319   B 2016-01-20    58    55.5
39    B 2016-01-20    84    55.5
299   B 2016-01-20    94    55.5
119   B 2016-01-20    98    55.5
199   C 2016-01-20    15    43.0
159   C 2016-01-20    43    43.0
479   C 2016-01-20    90    43.0
259   D 2016-01-20    12    33.0
419   D 2016-01-20    13    33.0
59    D 2016-01-20    15    33.0
139   D 2016-01-20    31    33.0
79    D 2016-01-20    33    33.0
19    D 2016-01-20    42    33.0
239   D 2016-01-20    46    33.0
399   D 2016-01-20    54    33.0
219   D 2016-01-20    63    33.0
179   E 2016-01-20    27    53.5
359   E 2016-01-20    80    53.5

关于python - 根据多个标识符查找值的中位数并添加到行,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/37244169/

10-10 07:56