本文介绍了多指标 pandas groupby,忽略一个级别?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述 我在一个多索引DataFrame上运行一个 groupby 操作,类似于这个: 0 1 ... 类别特征子特征 cat1特征1子特征1 -0.224487 -0.227524 子特征2 -0.591399 -0.799228 特征2子特征1 1.190110 -1.365895 ... subfeature2 0.720956 -1.325562 cat2 feature1 subfeature1 1.856932 NaN subfeature2 -1.354258 -0.740473 feature2 subfeature1 0.234075 -1.362235 ... subfeature2 0.013875 1.309564 cat3 feature1 subfeature1 NaN NaN subfeature2 -1.260408 1.559721 ... feature2 subfeature1 0.419246 0.084386 subfeature2 0.969270 1.493417 ... ...。 .. 它可以使用下面的代码生成: 将pandas导入为pd,numpy为np np.random.seed(seed = 90) results = np.random.randn(3 ,2,2,2)结果[2,0,0 ,:] = np.nan 结果[1,0,0,1] = np.nan 结果=结果.reshape(( - 1,2)) index = pd.MultiIndex.from_product([[cat1,cat2,cat3], [feature1,feature2] , [subfeature1,subfeature2]], names = [categories,features,subfeatures]) df = pd.DataFrame(results,index =索引) 我正在尝试仅选择两个子功能数组之间具有最大差异的组大于某个阈值,但是我遇到了 groupby df.groupby(level = ['categories','features']) {('cat1','feature1'):[('cat1' ,'feature1','subfeature1'),('cat1','feature1','subfeature2')],('cat1','feature2'):[('cat1',' ('cat2','feature1'),('cat1','feature2','subfeature2')],('cat2','feature1'):[('cat2','feature1' ,'sub2'),('cat2','feature1','subfeature2')],('cat2','feature2'):[('cat2','feature2',' ('cat2','feature2','subfeature2')],('cat3','feature1'):[('cat3','feature1','subfeature1' ),('cat3','feature1','subfeature2')],('cat3','feature2'):[('cat3','feature2','subfeature1'), ('cat3','feature2','subfeature2')]} 任何方式来分组,使subfeatu groupby 函数会忽略re级别?原因是我需要 subfeature1 和 subfeature2 在一起,在不同的组中它们毫无价值。 所以理想情况下,我希望 groupby 返回如下所示: ('cat2','feature1'):[('cat2','feature1')],('cat2','feature2'): [('cat2','feature2')],('cat3','feature1'):[('cat3','feature1')],('cat3','feature2' ):[('cat3','feature2')], 我该怎么做? / p> 解决方案 在[20]中:df.reset_index(level ='subfeatures')。 (level = ['categories','features'])。groups Out [20]: {('cat1','feature1'):[('cat1','feature1'), ('cat1','feature1')],$ b $ ('cat1','feature2'):[('cat1','feature2'),('cat1','feature2')],('cat2','feature1'):[ ('cat2','feature2'),('cat2','feature1')],('cat2','feature2'):[('cat2','feature2'), feature2')],('cat3','feature1'):[('cat3','feature1'),('cat3','feature1')],('cat3', 'feature2'):[('cat3','feature2'),('cat3','feature2')]} I'm running a groupby operation on a multiindexed DataFrame similar to this one: 0 1 ...categories features subfeatures cat1 feature1 subfeature1 -0.224487 -0.227524 subfeature2 -0.591399 -0.799228 feature2 subfeature1 1.190110 -1.365895 ... subfeature2 0.720956 -1.325562cat2 feature1 subfeature1 1.856932 NaN subfeature2 -1.354258 -0.740473 feature2 subfeature1 0.234075 -1.362235 ... subfeature2 0.013875 1.309564cat3 feature1 subfeature1 NaN NaN subfeature2 -1.260408 1.559721 ... feature2 subfeature1 0.419246 0.084386 subfeature2 0.969270 1.493417... ... ...And it can be generated using the following code:import pandas as pd, numpy as npnp.random.seed(seed=90)results = np.random.randn(3,2,2,2)results[2,0,0,:] = np.nanresults[1,0,0,1] = np.nanresults = results.reshape((-1,2))index = pd.MultiIndex.from_product([["cat1", "cat2", "cat3"], ["feature1", "feature2"], ["subfeature1", "subfeature2"]], names=["categories", "features", "subfeatures"])df = pd.DataFrame(results, index=index)I am attempting to select only the groups that have a maximum difference between two subfeature arrays that is greater than a certain threshold, but I'm having trouble with groupbydf.groupby(level=['categories','features'])This gives me the following groups:{('cat1', 'feature1'): [('cat1', 'feature1', 'subfeature1'), ('cat1', 'feature1', 'subfeature2')], ('cat1', 'feature2'): [('cat1', 'feature2', 'subfeature1'), ('cat1', 'feature2', 'subfeature2')], ('cat2', 'feature1'): [('cat2', 'feature1', 'subfeature1'), ('cat2', 'feature1', 'subfeature2')], ('cat2', 'feature2'): [('cat2', 'feature2', 'subfeature1'), ('cat2', 'feature2', 'subfeature2')], ('cat3', 'feature1'): [('cat3', 'feature1', 'subfeature1'), ('cat3', 'feature1', 'subfeature2')], ('cat3', 'feature2'): [('cat3', 'feature2', 'subfeature1'), ('cat3', 'feature2', 'subfeature2')]}Is there any way to group so that the subfeature level is ignored by the groupby function? The reason is that I need both subfeature1 and subfeature2 together, in separate groups they're worthless.So ideally I would want the groupby to return something like this:{('cat1', 'feature1'): [('cat1', 'feature1')], ('cat1', 'feature2'): [('cat1', 'feature2')], ('cat2', 'feature1'): [('cat2', 'feature1')], ('cat2', 'feature2'): [('cat2', 'feature2')], ('cat3', 'feature1'): [('cat3', 'feature1')], ('cat3', 'feature2'): [('cat3', 'feature2')],How could I do this? 解决方案 In [20]: df.reset_index(level='subfeatures').groupby(level=['categories','features']).groupsOut[20]: {('cat1', 'feature1'): [('cat1', 'feature1'), ('cat1', 'feature1')], ('cat1', 'feature2'): [('cat1', 'feature2'), ('cat1', 'feature2')], ('cat2', 'feature1'): [('cat2', 'feature1'), ('cat2', 'feature1')], ('cat2', 'feature2'): [('cat2', 'feature2'), ('cat2', 'feature2')], ('cat3', 'feature1'): [('cat3', 'feature1'), ('cat3', 'feature1')], ('cat3', 'feature2'): [('cat3', 'feature2'), ('cat3', 'feature2')]} 这篇关于多指标 pandas groupby,忽略一个级别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!
09-21 09:56