问题描述
我在python中有一个类似于这样的东西的熊猫数据框架 -
contest_login_count contest_participation_count ipn_ratio
0 1 1 0.000000
1 3 3 0.083333
2 3 3 0.000000
3 3 3 0.066667
4 5 13 0.102804
5 2 3 0.407407
6 1 3 0.000000
7 1 2 0.000000
8 53 91 0.264151
9 1 2 0.000000
现在我想对这个数据框的每一行应用一个函数这个函数写成这样 -
def findCluster(clusterModel,data):
return clusterModel.predict(data)
我以这种方式将此函数应用于每一行 -
df_fil.apply(lambda x:findCluster(cluster_all,x.reshape(1,-1)),axis = 1)
当我运行这段代码时,我得到一个警告 -
此警告每行打印一次。因为我的数据框中有大约450K行,所以我的计算机在ipython笔记本上打印所有这些警告消息时挂起。
但是为了测试我的功能,我创建了一个虚拟数据框,并尝试应用相同的功能,它运作良好。这里是代码 -
t = pd.DataFrame([[10.35,100.93,0.15],[10.35,100.93 ,0.15]])
t.apply(lambda x:findCluster(cluster_all,x.reshape(1,-1)),axis = 1)
对此的输出为 -
0 1 2
0 4 4 4
1 4 4 4
任何人都可以建议我做错了什么或者我可以改变什么使这个错误消失?
我认为有问题 dtype
某些列不是 float
。
您需要将它转换为 I have a data frame in pandas in python which resembles something like this - Now I want to apply a function to each row of this dataframe The function is written as this - I apply this function to each row in this manner - When I run this code, I get a warning saying - This warning is printed once for each row. Since, I have around 450K rows in my data frame, my computer hangs while printing all these warning messages that too on ipython notebook. But to test my function I created a dummy dataframe and tried applying the same function on that and it works well. Here is the code for that - The output to this is - Can anyone suggest what am I doing wrong or what can I change to make this error go away? I think there is problem You need cast it by 这篇关于数据转换错误,同时将函数应用于pandas Python中的每一行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持! astype
:
$ p $ df ['colname'] = df ['colname']。astype(float)
contest_login_count contest_participation_count ipn_ratio
0 1 1 0.000000
1 3 3 0.083333
2 3 3 0.000000
3 3 3 0.066667
4 5 13 0.102804
5 2 3 0.407407
6 1 3 0.000000
7 1 2 0.000000
8 53 91 0.264151
9 1 2 0.000000
def findCluster(clusterModel,data):
return clusterModel.predict(data)
df_fil.apply(lambda x : findCluster(cluster_all,x.reshape(1,-1)),axis=1)
t = pd.DataFrame([[10.35,100.93,0.15],[10.35,100.93,0.15]])
t.apply(lambda x:findCluster(cluster_all,x.reshape(1,-1)),axis=1)
0 1 2
0 4 4 4
1 4 4 4
dtype
of some column is not float
.astype
:df['colname'] = df['colname'].astype(float)