本文介绍了找出给定数据集中各列中缺失值的百分比的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
import pandas as pd
df = pd.read_csv('https://query.data.world/s/Hfu_PsEuD1Z_yJHmGaxWTxvkz7W_b0')
percent= 100*(len(df.loc[:,df.isnull().sum(axis=0)>=1 ].index) / len(df.index))
print(round(percent,2))
输入为 https://query.data.world/s/Hfu_PsEuD1Z_yJHmGaxWTxvkz7W_b0
,输出应为
Ord_id 0.00
Prod_id 0.00
Ship_id 0.00
Cust_id 0.00
Sales 0.24
Discount 0.65
Order_Quantity 0.65
Profit 0.65
Shipping_Cost 0.65
Product_Base_Margin 1.30
dtype: float64
推荐答案
这个怎么样?我想我实际上曾经在这里找到过类似的东西,但是现在看不到...
How about this? I think I actually found something similar on here once before, but I'm not seeing it now...
percent_missing = df.isnull().sum() * 100 / len(df)
missing_value_df = pd.DataFrame({'column_name': df.columns,
'percent_missing': percent_missing})
如果要对丢失的百分比进行排序,请按照上述步骤操作:
And if you want the missing percentages sorted, follow the above with:
missing_value_df.sort_values('percent_missing', inplace=True)
如评论中所述,您也许也可以仅通过上面我的代码中的第一行即可:
As mentioned in the comments, you may also be able to get by with just the first line in my code above, i.e.:
percent_missing = df.isnull().sum() * 100 / len(df)
这篇关于找出给定数据集中各列中缺失值的百分比的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!