问题描述
所以我使用Extra Trees Classifier来查找数据集中的要素重要性,它由13列和大约1000万行组成。我在上面放了一个椭圆形的信封,隔离林,一切都很好,它甚至占用了不到10 GB的空间。我在jupyter笔记本上运行了代码,即使将其设置为low_memory = True,它也给我带来内存错误。我尝试了拥有约25GB内存但仍崩溃的Google COlab,我现在非常困惑。
So I'm using Extra Trees Classifier in order to find the feature importance in my dataset, it consists of 13 columns and about 10 million rows. I have ran elliptic envelope on it, isolation forest and everything was fine, it even took less than 10 GB. I ran my code on jupyter note book and it gave me memory error even when I set it to low_memory=True. I tried Google COlab which has about 25GB of memory and still crashed, I'm very confused right now.
代码:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
from sklearn.ensemble import ExtraTreesClassifier
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
# Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
# Loading First Dataframe
link = '...'
fluff, id = link.split('=')
print (id) # Verify that you have everything after '='
downloaded = drive.CreateFile({'id':id})
downloaded.GetContentFile('Final After Simple Filtering.csv')
df = pd.read_csv('Final After Simple Filtering.csv',index_col=None,low_memory=True)
#df = df.astype(float)
ExtraT = ExtraTreesClassifier(n_estimators = 100,bootstrap=False,n_jobs=1)
y=df['Power_kW']
del df['Power_kW']
X=df
ExtraT.fit(X,y)
feature_importance = ExtraT.feature_importances_
feature_importance_normalized = np.std([tree.feature_importances_ for tree in ExtraT.estimators_], axis = 1)
plt.bar(X.columns, feature_importance)
plt.xlabel('Lable')
plt.ylabel('Feature Importance')
plt.title('Parameters Importance')
plt.show()
谢谢
推荐答案
我之前遇到过同样的错误,我已经解决了。
I had the same Error before and i solved it.
更改运行时类型
GPU比CPU更快,这样会有所帮助。但是该怎么做呢?请按照以下步骤操作:
Change Runtime type GPU is Faster more than CPU , so it will help. But How to Do that ? Follow this steps:
确保使用25GB而不是12GB的RAM。
不要忘记Colab是免费版和限量版。
如果仍然有问题,请告诉我,我会尽快帮助您。
Be sure that you use 25GB not 12GB of RAM .Don't forget that Colab is free and limited Edition.If still have a problem , tell me and i will help you ASAP.
这篇关于我的代码使用了超过25GB的内存和崩溃的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!