本文介绍了如何在Pyspark中获取列和nans / null百分比的列表视图?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在具有59K行和21列的数据集上运行一个简单的EDA。我想看到的是所有列的列表以及null / nans的百分比。我在虚拟机中的Jupyter中运行以下代码:
I am running a simple EDA on my dataset that has 59K rows and 21 columns. What I would like to see is a list of all columns and the % of the nulls/nans. I ran the following code in Jupyter in my virtual machine:
#Checking nulls by column
from pyspark.sql.functions import *
null_df = datingDF.select([(count(when(isnan(c) | col(c).isNull(), c))/count(lit(1))).alias(c) for c in datingDF.columns])
null_df.show()
输出确实很混乱,而且不是干净的列表(请参见附件)
The output is really cluttered and not a clean list (see attached)
推荐答案
替换 null_df.show ()
与:
for i,j in null_df.first().asDict().items():
print(i,j)
这篇关于如何在Pyspark中获取列和nans / null百分比的列表视图?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!