本文介绍了如何在python / pandas中复制excel COUNTIFS?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想得到df ['A']中前5个值中的#个数, df ['A']&也是> = df2 ['A']。我试图避免循环遍历每行和列,因为我想将它应用于更大的数据集。



给定这个...

  list1 = [[21,101],[22,110],[25,113],[24,112],[21,109],[28,108],[30,102] [26,106],[25,111],[24,110]] 
df = pd.DataFrame(list1,index = pd.date_range('2000-1-1',periods = 10,freq ='D'),列= list('AB')
df2 = pd.DataFrame(df *(1-.05))

我想返回(在Excel中用COUNTIFS解决)...





以下线路实现了第一部分(感谢亚历山大),而Divakar和DSM也在之前(



更新:不同的list1数据产生不正确的df3 ...

  list1 = [[21,101 ],[22,110],[25,113],[24,112],[21,109],[26,108],[25,102],[26,106],[25,111],[22,110]] 
df = pd.DataFrame(list1, index = pd.date_range('2000-1-1',periods = 10,freq ='D'),columns = list('AB')
df2 = pd.DataFrame(df *(1-。 05))

df3 = pd.DataFrame(
df.rolling(center = False,window = 6).apply(
lambda rollwin:pd.Series(rollwin [ -1])。(rollwin [-1] * 0.95,rollwin [-1])sum()))

df
输出[9]:
AB
2000-01-01 21 101
2000-01-02 22 110
2000-01-03 25 113
2000-01-04 24 112
2000- 01-05 21 109
2000-01-06 26 108
2000-01-07 25 102
2000-01-08 26 106
2000-01-09 25 111
2000-01-10 22 110


df3
出[8]:
AB
2000-01-01 NaN NaN
2000-01-02 NaN NaN
2000- 01-03 NaN NaN
2000-01-04 NaN NaN
2000-01-05 NaN NaN
2000-01-06 1.0 0.0
2000-01-07 2.0 0.0
2000-01-08 3.0 1.0
2000-01-09 2.0 3.0
2000-01-10 1.0 3.0

EXCEL示例(11/14):参见下文,尝试计算蓝色框中有多少数字落在橙色突出显示的范围之间。



解决方案

pre> list1 = [[21,50,101],[22,52,110],[25,49,113],[24,49,112],[21,55,109],[28,54,108] [30,57,102],[26,56,106],[25,58,111],[24,60,110]]
df = pd.DataFrame(list1,index = pd.date_range('2000-1-1'期间= 10,频率='D'),列=列表('ABC')

打印df

我相信这符合您的新屏幕截图给定数据。

  ABC 
2000-01-01 21 50 101
2000-01-02 22 52 110
2000-01-03 25 49 113
2000-01-04 24 49 112
2000 -01-05 21 55 109
2000-01-06 28 54 108
2000-01-07 30 57 102
2000-01-08 26 56 106
2000-01 -09 25 58 111
2000-01-10 24 60 110


和相同的功能:

  print pd.DataFrame(
df.rolling(center = False,window = 6)。
apply(lambda rollwin:pd.Series(rollwin [: - 1])$ ​​b $ b(rollwin [-1] * 0.95,rollwin [-1])。sum()))

提供您所需的输出所需结果:

  ABC 
2000-01-01 nan nan nan
2000-01-02 nan nan nan
2000-01-03 nan nan nan
2000-01-04 nan nan nan
2000-01-05 nan nan nan
2000-01-06 0 1 0
2000-01-07 0 1 0
2000- 01-08 1 2 1
2000-01-09 1 2 3
2000-01-10 0 2 3

I would like to get a count for the # of the previous 5 values in df['A'] which are < current value in df['A'] & are also >= df2['A']. I am trying to avoid looping over every row and columns because I'd like to apply this to a larger data set.

Given this...

list1 = [[21,101],[22,110],[25,113],[24,112],[21,109],[28,108],[30,102],[26,106],[25,111],[24,110]]
df = pd.DataFrame(list1,index=pd.date_range('2000-1-1',periods=10, freq='D'), columns=list('AB'))
df2 = pd.DataFrame(df * (1-.05))

I would like to return this (solved in Excel with COUNTIFS)...

The line below achieves the first part (thanks Alexander), and Divakar and DSM have also weighed in previously (here and here).

df3 = pd.DataFrame(df.rolling(center=False,window=6).apply(lambda rollwin: sum((rollwin[:-1] < rollwin[-1]))))

But I am unable to to add the comparison to df2. Please help.

FOLLOW UP on 10/27/16:

How would I write the lambda above as a standard function?

10/28/16:

See below, taking col 'A' from both df and df2, I am trying to count how many of the previous 5 values from df['A'] fall between the current df2['A'] and df['A']. Said differently, how many from each orange box fall between the yellow low-high range?

UPDATE: different list1 data produces incorrect df3...

list1 = [[21,101],[22,110],[25,113],[24,112],[21,109],[26,108],[25,102],[26,106],[25,111],[22,110]]
df = pd.DataFrame(list1,index=pd.date_range('2000-1-1',periods=10, freq='D'), columns=list('AB'))
df2 = pd.DataFrame(df * (1-.05))

df3 = pd.DataFrame(
     df.rolling(center=False,window=6).apply(
          lambda rollwin: pd.Series(rollwin[:-1]).between(rollwin[-1]*0.95,rollwin[-1]).sum()))

df
Out[9]:
             A    B
2000-01-01  21  101
2000-01-02  22  110
2000-01-03  25  113
2000-01-04  24  112
2000-01-05  21  109
2000-01-06  26  108
2000-01-07  25  102
2000-01-08  26  106
2000-01-09  25  111
2000-01-10  22  110


df3
Out[8]:
              A    B
2000-01-01  NaN  NaN
2000-01-02  NaN  NaN
2000-01-03  NaN  NaN
2000-01-04  NaN  NaN
2000-01-05  NaN  NaN
2000-01-06  1.0  0.0
2000-01-07  2.0  0.0
2000-01-08  3.0  1.0
2000-01-09  2.0  3.0
2000-01-10  1.0  3.0

EXCEL EXAMPLES (11/14): see below, trying to count how many numbers in the blue box fall between the range highlighted in orange.

解决方案
list1 = [[21,50,101],[22,52,110],[25,49,113],[24,49,112],[21,55,109],[28,54,108],[30,57,102],[26,56,106],[25,58,111],[24,60,110]]
df = pd.DataFrame(list1,index=pd.date_range('2000-1-1',periods=10, freq='D'), columns=list('ABC'))

print df

I believe this matches your new screen shot "Given Data".

             A   B    C
2000-01-01  21  50  101
2000-01-02  22  52  110
2000-01-03  25  49  113
2000-01-04  24  49  112
2000-01-05  21  55  109
2000-01-06  28  54  108
2000-01-07  30  57  102
2000-01-08  26  56  106
2000-01-09  25  58  111
2000-01-10  24  60  110

and the same function:

print pd.DataFrame(
           df.rolling(center=False,window=6).
              apply(lambda rollwin: pd.Series(rollwin[:-1]).
                   between(rollwin[-1]*0.95,rollwin[-1]).sum()))

gives your desired output "Desired outcome":

             A   B   C
2000-01-01 nan nan nan
2000-01-02 nan nan nan
2000-01-03 nan nan nan
2000-01-04 nan nan nan
2000-01-05 nan nan nan
2000-01-06   0   1   0
2000-01-07   0   1   0
2000-01-08   1   2   1
2000-01-09   1   2   3
2000-01-10   0   2   3

这篇关于如何在python / pandas中复制excel COUNTIFS?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-14 22:57