我正在遍历数据框的每一列,并尝试创建对数图
cols = in_df.columns
for col in cols:
in_df[col]=in_df[col].dropna()
print (in_df[col].values)
in_df[col].map(np.log).hist(bins=1000)
plt.xlabel(x_label+col)
plt.ylabel('Number of customers in train')
plt.savefig(save_dir+col+'.png')
plt.close()
但我收到以下错误:
[2 2 2 ..., 2 2 2]
in_df[col].map(np.log).hist(bins=1000)
File "anaconda/envs/kaggle3/lib/python3.5/site-packages/pandas/tools/plotting.py", line 2988, in hist_series
ax.hist(values, bins=bins, **kwds)
File "anaconda/envs/kaggle3/lib/python3.5/site-packages/matplotlib/__init__.py", line 1819, in inner
return func(ax, *args, **kwargs)
File "anaconda/envs/kaggle3/lib/python3.5/site-packages/matplotlib/axes/_axes.py", line 5985, in hist
m, bins = np.histogram(x[i], bins, weights=w[i], **hist_kwargs)
File "anaconda/envs/kaggle3/lib/python3.5/site-packages/numpy/lib/function_base.py", line 505, in histogram
'range parameter must be finite.')
ValueError: range parameter must be finite.
请注意以下工作:
in_df.col_name.map(np.log).hist(bins=1000)
但是,在遍历所有列时不能使用这种方法。知道为什么我会收到错误吗?
最佳答案
如果我对零是正确的,则解决问题的最简单方法是删除它们。有很多方法可以做到这一点。下面是一个:
cols = in_df.columns
for col in cols:
in_df[col]=in_df[col].dropna()
print (in_df[col].values)
# I edited line below
in_df[col].replace(0, np.nan).dropna().map(np.log).hist(bins=1000)
# added |<------------------------>|
plt.xlabel(x_label+col)
plt.ylabel('Number of customers in train')
plt.savefig(save_dir+col+'.png')
plt.close()