我有一个来自kaggle.com的python项目我在读取数据集时遇到问题它有一个csv文件我们需要读入它,把目标和训练的一部分放在数组中。
以下是数据集的前3行(目标列是第19列,功能是前18列):

user    gender  age how_tall_in_meters  weight  body_mass_index x1
debora  Woman   46  1.62    75  28.6    -3
debora  Woman   46  1.62    75  28.6    -3

此处未显示的目标列具有字符串值。
from pandas import read_csv
import numpy as np
from sklearn.linear_model.stochastic_gradient import SGDClassifier
from sklearn import preprocessing
import sklearn.metrics as metrics
from sklearn.cross_validation import train_test_split

#d = pd.read_csv("data.csv", dtype={'A': np.str(), 'B': np.str(), 'S': np.str()})

dataset = np.genfromtxt(open('data.csv','r'), delimiter=',', dtype='f8')[1:]
target = np.array([x[19] for x in dataset])
train = np.array([x[1:] for x in dataset])

print(target)

我得到的错误是:
Traceback (most recent call last):
  File "C:\Users\Cameron\Desktop\Project - Machine learning\datafilesforproj\SGD_classifier.py", line 12, in <module>
    dataset = np.genfromtxt(open('data.csv','r'), delimiter=',', dtype='f8')[1:]
  File "C:\Python33\lib\site-packages\numpy\lib\npyio.py", line 1380, in genfromtxt
    first_values = split_line(first_line)
  File "C:\Python33\lib\site-packages\numpy\lib\_iotools.py", line 217, in _delimited_splitter
    line = line.split(self.comments)[0]
TypeError: Can't convert 'bytes' object to str implicitly

最佳答案

对我有用的是改变路线

dataset = np.genfromtxt(open('data.csv','r'), delimiter=',', dtype='f8')[1:]


dataset = np.genfromtxt('data.csv', delimiter=',', dtype='f8')[1:]

(不幸的是,我不太确定潜在的问题是什么)

10-06 05:43
查看更多