我有一个如下所示的数据集,我试图获取特征列中重要性列不等于0.000000的每个名称,并将它们直接放入列表中以立即使用。我已经尝试了几种方法,但是显示诺言的主要两种方法如下:
方法1
new_features = []
for i in importance_ranking['importance']:
if i > 0.000000:
new_features.append(i)
new_features
方法1只是抓住了我重要性列的所有值,但是我想要功能列的值,所以我尝试了方法2
方法2
features_to_use = []
for x,y in importance_ranking:
if y > 0.000000:
features_to_use.append(x)
features_to_use
方法2抛出以下错误:
方法2错误
ValueError Traceback (most recent call last)
<ipython-input-1181-d1ec4f141ff9> in <module>()
1 features_to_use = []
----> 2 for x,y in importance_ranking:
3 if y > 0.000000:
4 features_to_use.append(x)
5
ValueError: too many values to unpack (expected 2)
任何帮助是极大的赞赏
方法3和错误
features_to_use = []
for s,x,y in importance_ranking:
if y > 0.000000:
features_to_use.append(x)
features_to_use
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-1182-8ed92369130e> in <module>()
1 features_to_use = []
----> 2 for s,x,y in importance_ranking:
3 if y > 0.000000:
4 features_to_use.append(x)
5
ValueError: too many values to unpack (expected 3)
数据集
**feature importance**
1 src_bytes 0.541433
18 count 0.160338
30 dst_host_diff_srv_rate 0.074743
53 service_bgp 0.066960
31 dst_host_same_src_port_rate 0.045040
28 dst_host_srv_count 0.027176
9 num_compromised 0.016684
25 diff_srv_rate 0.008991
58 service_pm_dump 0.008533
62 service_auth 0.008270
29 dst_host_same_srv_rate 0.006760
2 dst_bytes 0.005153
33 dst_host_serror_rate 0.004642
6 hot 0.003985
32 dst_host_srv_diff_host_rate 0.003330
35 dst_host_rerror_rate 0.002923
34 dst_host_srv_serror_rate 0.002222
87 service_klogin 0.002135
116 flag_SH 0.001553
0 duration 0.001263
7 num_failed_logins 0.001125
22 rerror_rate 0.001011
27 dst_host_count 0.000917
4 wrong_fragment 0.000736
52 service_ntp_u 0.000489
37 flag_RSTOS0 0.000468
3 land 0.000449
111 service_tftp_u 0.000355
19 srv_count 0.000289
8 logged_in 0.000284
... ... ...
16 is_host_login 0.000000
40 service_Z39_50 0.000000
41 service_http_443 0.000000
43 service_other 0.000000
44 protocol_type_tcp 0.000000
45 service_link 0.000000
46 service_X11 0.000000
47 service_exec 0.000000
48 service_red_i 0.000000
49 service_http_2784 0.000000
用于创建数据框的行
importance_ranking = pd.DataFrame({'feature':all_cols, 'importance':dt.feature_importances_})
数据框图片
new_test
#features_to_use = []
a,b = importance_ranking[0]
#for s,x,y in importance_ranking:
# if y > 0.000000:
# features_to_use.append(x)
#
#features_to_use
KeyError Traceback (most recent call last)
~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2524 try:
-> 2525 return self._engine.get_loc(key)
2526 except KeyError:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 0
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
<ipython-input-1244-5d9e2e614219> in <module>()
1 #features_to_use = []
----> 2 a,b = importance_ranking[0]
3 #for s,x,y in importance_ranking:
4 # if y > 0.000000:
5 # features_to_use.append(x)
~\Anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
2137 return self._getitem_multilevel(key)
2138 else:
-> 2139 return self._getitem_column(key)
2140
2141 def _getitem_column(self, key):
~\Anaconda3\lib\site-packages\pandas\core\frame.py in _getitem_column(self, key)
2144 # get column
2145 if self.columns.is_unique:
-> 2146 return self._get_item_cache(key)
2147
2148 # duplicate columns & possible reduce dimensionality
~\Anaconda3\lib\site-packages\pandas\core\generic.py in _get_item_cache(self, item)
1840 res = cache.get(item)
1841 if res is None:
-> 1842 values = self._data.get(item)
1843 res = self._box_item_values(item, values)
1844 cache[item] = res
~\Anaconda3\lib\site-packages\pandas\core\internals.py in get(self, item, fastpath)
3841
3842 if not isna(item):
-> 3843 loc = self.items.get_loc(item)
3844 else:
3845 indexer = np.arange(len(self.items))[isna(self.items)]
~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2525 return self._engine.get_loc(key)
2526 except KeyError:
-> 2527 return self._engine.get_loc(self._maybe_cast_indexer(key))
2528
2529 indexer = self.get_indexer([key], method=method, tolerance=tolerance)
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 0
最佳答案
DataFrames提供了一种选择所需数据的好方法
features_to_use = importance_ranking[importance_ranking['importance'] > 0.0]['importance'].values.tolist()
乍一看可能很难理解,但是您实际要做的是过滤所有重要性大于0.0的重要性列表,然后选择满足此条件的重要性列的重要性列。
.values.tolist()
行的其余部分仅用于解压缩数据。如果您对此解决方案不满意,可以尝试逐步进行:
df = importance_ranking[importance_ranking['importance'] > 0.0] # Filtered Dataframe
importance_values = df['importance'] # Series Object
features_to_use = importance_values.values.tolist()
关于python - 试图在python 3中获取所有不等于0.000000的列值,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/50075351/