本文介绍了由于“重新塑形",无法在scikit-Learn中进行线性回归.问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含两列的简单CSV:

I have a simple CSV with two columns:

  1. ErrorWeek(数字,表示一年中的第几周)
  2. ErrorCount(表示一周中的错误数)

我将CSV数据读入pandas数据框中,如下所示:

I read the CSV data into a pandas dataframe, like this:

df = pd.read_csv("Errors.csv", sep=",")

df.head()显示:

df.head() shows:

   ErrorWeek  ErrorCount
0          1          80
1          2         118
2          3         249
3          4         397
4          5         159

到目前为止一切都很好.

So far so good.

然后,我创建一个测试/训练组,如下所示:

Then, I create a test/train split, like this:

X_train, X_test, y_train, y_test = train_test_split(
    df['ErrorWeek'], df['ErrorCount'], random_state=0)

到目前为止没有错误.

但是,我然后创建一个线性回归对象并尝试拟合数据.

But, I then create a linear regression object and try to fit the data.

# Create linear regression object
regr = linear_model.LinearRegression()

# Train the model using the training sets
regr.fit(X_train, y_train)

这样做出现错误:使用array.reshape(-1,1)重塑数据"

Here I do get an error: "Reshape your data either using array.reshape(-1, 1)"

-

看看X_Test和y_Test的形状,我得到的东西看起来像是两个一维的数组":

Looking at the shape of X_Test and y_Test, I get what looks like two one dimensional "arrays":

X_train shape: (36,)
y_train shape: (36,)

-

我花了很多时间试图解决这个问题,但是我是Pandas,Python和scikit-learn的新手.

I have spent many hours trying to figure this out, but I'm new to Pandas, Python, and to scikit-learn.

我正在读取二维数据,但是Pandas并没有那样看.

I'm reading in two dimensional data, but Pandas isn't seeing that way.

我需要做什么,具体是什么?

What do I need to do, specifically?

谢谢

推荐答案

操作:

X_train, X_test, y_train, y_test = train_test_split(
         df['ErrorWeek'], df['ErrorCount'], random_state=0)

将使所有输出数组都是一维的,因为您要为X和y选择单个列值.

will make all output arrays of one dimension because you are choosing a single column value for X and y.

现在,当您传递[n,]的一维数组时,Scikit-learn无法确定所传递的是一行具有多列的数据,还是具有一行的多个数据样本.即sklearn可能无法仅基于X数据推断其n_samples = n和n_features = 1还是其他方式(n_samples = 1和n_features = n).

Now, when you pass a one dimensional array of [n,], Scikit-learn is not able to decide that what you have passed is one row of data with multiple columns, or multiple samples of data with single column. i.e. sklearn may not infer whether its n_samples=n and n_features=1 or other way around (n_samples=1 and n_features=n) based on X data alone.

因此,它要求您将提供的一维数据重塑为形状为[n_samples, n_features]的二维数据

Hence it asks you reshape the 1-D data you provided to a 2-d data of shape [n_samples, n_features]

现在有多种方法可以做到这一点.

Now there are multiple ways of doing this.

  • 您可以执行scikit-learn所说的话:

  • You can do what the scikit-learn says:

X_train = X_train.reshape(-1,1)X_test = X_test.reshape(-1,1)

X_train = X_train.reshape(-1,1)X_test = X_test.reshape(-1,1)

第二个位置为整形的1表示只有一列,而-1表示将自动检测此单列的行数.

The 1 in the second place of reshape tells that there is a single column only and -1 is to detect the number of rows automatically for this single column.

  • 按照@MaxU和@Wen的其他答案中的建议进行操作

这篇关于由于“重新塑形",无法在scikit-Learn中进行线性回归.问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-11 22:43