我有一个如下所示的数据集,我试图获取特征列中重要性列不等于0.000000的每个名称,并将它们直接放入列表中以立即使用。我已经尝试了几种方法,但是显示诺言的主要两种方法如下:

方法1

new_features = []

for i in importance_ranking['importance']:
    if i > 0.000000:
        new_features.append(i)

new_features


方法1只是抓住了我重要性列的所有值,但是我想要功能列的值,所以我尝试了方法2

方法2

features_to_use = []
for x,y in importance_ranking:
    if y > 0.000000:
        features_to_use.append(x)

features_to_use


方法2抛出以下错误:

方法2错误

    ValueError                                Traceback (most recent call last)
<ipython-input-1181-d1ec4f141ff9> in <module>()
      1 features_to_use = []
----> 2 for x,y in importance_ranking:
      3     if y > 0.000000:
      4         features_to_use.append(x)
      5

ValueError: too many values to unpack (expected 2)


任何帮助是极大的赞赏

方法3和错误

    features_to_use = []
for s,x,y in importance_ranking:
    if y > 0.000000:
        features_to_use.append(x)

features_to_use
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-1182-8ed92369130e> in <module>()
      1 features_to_use = []
----> 2 for s,x,y in importance_ranking:
      3     if y > 0.000000:
      4         features_to_use.append(x)
      5

ValueError: too many values to unpack (expected 3)


数据集

   **feature    importance**
1   src_bytes   0.541433
18  count   0.160338
30  dst_host_diff_srv_rate  0.074743
53  service_bgp 0.066960
31  dst_host_same_src_port_rate 0.045040
28  dst_host_srv_count  0.027176
9   num_compromised 0.016684
25  diff_srv_rate   0.008991
58  service_pm_dump 0.008533
62  service_auth    0.008270
29  dst_host_same_srv_rate  0.006760
2   dst_bytes   0.005153
33  dst_host_serror_rate    0.004642
6   hot 0.003985
32  dst_host_srv_diff_host_rate 0.003330
35  dst_host_rerror_rate    0.002923
34  dst_host_srv_serror_rate    0.002222
87  service_klogin  0.002135
116 flag_SH 0.001553
0   duration    0.001263
7   num_failed_logins   0.001125
22  rerror_rate 0.001011
27  dst_host_count  0.000917
4   wrong_fragment  0.000736
52  service_ntp_u   0.000489
37  flag_RSTOS0 0.000468
3   land    0.000449
111 service_tftp_u  0.000355
19  srv_count   0.000289
8   logged_in   0.000284
... ... ...
16  is_host_login   0.000000
40  service_Z39_50  0.000000
41  service_http_443    0.000000
43  service_other   0.000000
44  protocol_type_tcp   0.000000
45  service_link    0.000000
46  service_X11 0.000000
47  service_exec    0.000000
48  service_red_i   0.000000
49  service_http_2784   0.000000


用于创建数据框的行

importance_ranking = pd.DataFrame({'feature':all_cols, 'importance':dt.feature_importances_})


数据框图片

python - 试图在python 3中获取所有不等于0.000000的列值-LMLPHP

new_test

#features_to_use = []
a,b = importance_ranking[0]
#for s,x,y in importance_ranking:
 #   if y > 0.000000:
     #   features_to_use.append(x)
#
#features_to_use


KeyError                                  Traceback (most recent call last)
~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   2524             try:
-> 2525                 return self._engine.get_loc(key)
   2526             except KeyError:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 0

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-1244-5d9e2e614219> in <module>()
      1 #features_to_use = []
----> 2 a,b = importance_ranking[0]
      3 #for s,x,y in importance_ranking:
      4  #   if y > 0.000000:
      5      #   features_to_use.append(x)

~\Anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
   2137             return self._getitem_multilevel(key)
   2138         else:
-> 2139             return self._getitem_column(key)
   2140
   2141     def _getitem_column(self, key):

~\Anaconda3\lib\site-packages\pandas\core\frame.py in _getitem_column(self, key)
   2144         # get column
   2145         if self.columns.is_unique:
-> 2146             return self._get_item_cache(key)
   2147
   2148         # duplicate columns & possible reduce dimensionality

~\Anaconda3\lib\site-packages\pandas\core\generic.py in _get_item_cache(self, item)
   1840         res = cache.get(item)
   1841         if res is None:
-> 1842             values = self._data.get(item)
   1843             res = self._box_item_values(item, values)
   1844             cache[item] = res

~\Anaconda3\lib\site-packages\pandas\core\internals.py in get(self, item, fastpath)
   3841
   3842             if not isna(item):
-> 3843                 loc = self.items.get_loc(item)
   3844             else:
   3845                 indexer = np.arange(len(self.items))[isna(self.items)]

~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   2525                 return self._engine.get_loc(key)
   2526             except KeyError:
-> 2527                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2528
   2529         indexer = self.get_indexer([key], method=method, tolerance=tolerance)

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 0

最佳答案

DataFrames提供了一种选择所需数据的好方法

features_to_use = importance_ranking[importance_ranking['importance'] > 0.0]['importance'].values.tolist()


乍一看可能很难理解,但是您实际要做的是过滤所有重要性大于0.0的重要性列表,然后选择满足此条件的重要性列的重要性列。 .values.tolist()行的其余部分仅用于解压缩数据。

如果您对此解决方案不满意,可以尝试逐步进行:

df = importance_ranking[importance_ranking['importance'] > 0.0] # Filtered Dataframe
importance_values = df['importance'] # Series Object
features_to_use = importance_values.values.tolist()

关于python - 试图在python 3中获取所有不等于0.000000的列值,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/50075351/

10-12 21:52