尝试删除缺少数据的行是“?”在这种情况下,将最后一列(收入)转换为布尔值。在StackOverflow上遵循了几个答案,但仍然无法正常工作。这是代码:

%pylab inline
import numpy as np
import pylab as pl
import pandas as pd
from sklearn.tree import DecisionTreeClassifier

fileURL = 'http://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data'

df = pd.read_csv(fileURL,
                 names=['age','type_employer', 'fnlwgt', 'education',
                        'education_num', 'marital', 'occupation', 'relationship',
                        'race','sex','capital_gain', 'capital_loss', 'hr_per_week','country', 'income'],
                 na_values = ['?'])

df = df.dropna(how='any')
boolean = {'>50K': True, '<=50K': False}
df['income'].map(boolean)
df


谢谢。

最佳答案

您使用了几乎正确的方法,但是在解析时捕获了多余的空间。 CSV不应包含空格。

df = pd.read_csv(fileURL,
             names=['age','type_employer', 'fnlwgt', 'education',
                    'education_num', 'marital', 'occupation',
                    'relationship',
                    'race','sex','capital_gain', 'capital_loss',
                    'hr_per_week','country', 'income'],
             na_values = [' ?'])

df = df.dropna(how='any')
boolean = {' >50K': True, ' <=50K': False}
df['income'] = df['income'].map(boolean)
df

关于python - Python(Pandas)-删除带有NA的行并将值转换为 bool 值,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/40645061/

10-14 17:36