尝试删除缺少数据的行是“?”在这种情况下,将最后一列(收入)转换为布尔值。在StackOverflow上遵循了几个答案,但仍然无法正常工作。这是代码:
%pylab inline
import numpy as np
import pylab as pl
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
fileURL = 'http://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data'
df = pd.read_csv(fileURL,
names=['age','type_employer', 'fnlwgt', 'education',
'education_num', 'marital', 'occupation', 'relationship',
'race','sex','capital_gain', 'capital_loss', 'hr_per_week','country', 'income'],
na_values = ['?'])
df = df.dropna(how='any')
boolean = {'>50K': True, '<=50K': False}
df['income'].map(boolean)
df
谢谢。
最佳答案
您使用了几乎正确的方法,但是在解析时捕获了多余的空间。 CSV不应包含空格。
df = pd.read_csv(fileURL,
names=['age','type_employer', 'fnlwgt', 'education',
'education_num', 'marital', 'occupation',
'relationship',
'race','sex','capital_gain', 'capital_loss',
'hr_per_week','country', 'income'],
na_values = [' ?'])
df = df.dropna(how='any')
boolean = {' >50K': True, ' <=50K': False}
df['income'] = df['income'].map(boolean)
df
关于python - Python(Pandas)-删除带有NA的行并将值转换为 bool 值,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/40645061/