本文介绍了使用qcut pandas进行多种有价值的分类的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用数据帧中两列中的两个值并执行qcut分类.

I am trying to use two values from two columns from a dataframe and perform qcut categorization.

对单个值进行分类非常简单.但是两个变量成对和vs是我想要得到的.

single value categorizing it quite simple. But two variables as pairs and vs is something I am trying get.

输入:

date,startTime,endTime,day,c_count,u_count
2004-01-05,22:00:00,23:00:00,Mon,18944,790
2004-01-05,23:00:00,00:00:00,Mon,17534,750
2004-01-06,00:00:00,01:00:00,Tue,17262,747
2004-01-06,01:00:00,02:00:00,Tue,19072,777
2004-01-06,02:00:00,03:00:00,Tue,18275,785
2004-01-06,03:00:00,04:00:00,Tue,13589,757
2004-01-06,04:00:00,05:00:00,Tue,16053,735
2004-01-06,05:00:00,06:00:00,Tue,11440,636
2004-01-06,06:00:00,07:00:00,Tue,5972,513
2004-01-06,07:00:00,08:00:00,Tue,3424,382
2004-01-06,08:00:00,09:00:00,Tue,2696,303
2004-01-06,09:00:00,10:00:00,Tue,2350,262
2004-01-06,10:00:00,11:00:00,Tue,2309,254

使用纯python编写的代码,但是我试图在熊猫中做同样的事情.

Code with pure python but I am trying to do the same in pandas.

for row in csv.reader(inp):
        if int(row[1])>(0.80*c_count) and int(row[2])>(0.80*u_count):
            val='highly active'
        elif int(row[1])>=(0.60*c_count) and int(row[2])<=(0.60*u_count):
            val='active'
        elif int(row[1])<=(0.40*c_count) and int(row[2])>=(0.40*u_count):
            val='event based'
        elif int(row[1])<(0.20*c_count) and int(row[2])<(0.20*u_count):
            val ='situational'
        else:
            val= 'viewers'

我想找到的是什么?

  1. c_countu_count
  2. 就像上面的代码c_count vs u_count
  1. c_count and u_count both
  2. Like in the above code c_count vs u_count

推荐答案

您可以为每个分位数组创建一个系列:

You can create a Series for each quantile group:

q = df[['c_count', 'u_count']].apply(lambda x: pd.qcut(x, np.linspace(0, 1, 6),
                                                       labels=np.arange(5)))
q
Out:
   c_count u_count
0        4       4
1        3       3
2        3       2
3        4       4
4        4       4
5        2       3
6        2       2
7        2       2
8        1       1
9        1       1
10       0       0
11       0       0
12       0       0

0表示前20%,1表示20%-40%,然后继续.

0 is for the first 20%, 1 is for 20%-40% and goes on.

现在,if逻辑在这里有所不同.对于else部分,首先填充列:

Now the if logic works a little different here. For the else part, first populate the column:

df['val'] = 'viewers'

如果满足条件,我们随后做的所有事情都会覆盖此列中的值.因此,我们稍后进行的操作在上一个操作之前.从下至上:

Anything we do afterwards will overwrite the values in this column if condition is satisfied. So the operation we do later precedes the previous one. From bottom to top:

df.ix[(q['c_count'] < 1) & (q['u_count'] < 1), 'val'] = 'situational'
df.ix[(q['c_count'] < 2) & (q['u_count'] > 1), 'val'] = 'event_based'
df.ix[(q['c_count'] > 2) & (q['u_count'] < 2), 'val'] = 'active'
df.ix[(q['c_count'] > 3) & (q['u_count'] > 3), 'val'] = 'highly active'

第一个条件检查c_count和u_count是否都在前20%中.如果是这样,请将"val"列的相应行更改为情境.其余的工作方式相似.您可能需要稍微调整比较运算符(大于或大于或等于).

The first condition checks whether both c_count and u_count are in the first 20%. If so, changes the corresponding rows at 'val' column to situational. The remaining ones work in a similar manner. You might need to adjust comparison operators a little bit (greater vs greater than or equal to).

这篇关于使用qcut pandas进行多种有价值的分类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-22 21:22