这是sklearn的linear regression和前两天的statsmodel不一样,statsmodel如果用 statsmodels.api,不论是fit方法还是predict方法,都需要用sm.add_constant方法增加一列const,如果使用statsmodels.formula.api则不需要add_constant方法,只需要传入R-style formula string就可以.使用sklearn的LinearRegression则可以直接fit,predict. 只是要注意传入参数的shape.


点击(此处)折叠或打开

  1. # coding: utf-8

  2. import numpy as np
  3. import statsmodels.api as sm
  4. import seaborn as sns
  5. import matplotlib.pyplot as plt
  6. import pandas as pd
  7. from sklearn.linear_model import LinearRegression
  8. sns.set()

  9. data=pd.read_csv('data-analysis/python-jupyter/1.01. Simple linear regression.csv')

  10. y=data['GPA']
  11. X=data['SAT']
  12. print(X.shape)
  13. print(y.shape)

  14. reg=LinearRegression()
  15. '''
  16. run reg.fit(X,y)
  17. error message:
  18. ValueError: Expected 2D array, got 1D array instead:

  19. check X type
  20. type(X) == Series,

  21. '''

  22. #reshape
  23. X_matrix=X.values.reshape(-1,1)
  24. print(X_matrix.shape)

  25. reg.fit(X_matrix, y)

  26. '''
  27. reg.score: R-squared
  28. reg.coef_: coefficient / slope
  29. reg.intercept_: intercept
  30. '''
  31. print(reg.score(X_matrix, y))
  32. print(reg.coef_)
  33. print(reg.intercept_)


  34. #make prediction
  35. gen_data=np.linspace(1700,1800, num=10, dtype=int)
  36. new_data=pd.DataFrame(data=gen_data, columns=['SAT'])
  37. reg.predict(new_data)
  38. new_data['Predicted_GPA']=reg.predict(new_data)
  39. print(new_data)

下面是前两天的用statsmodel.api的predict部分

点击(此处)折叠或打开

  1. #predict
  2. gen_data=np.linspace(1700,1800, num=10, dtype=int)
  3. new_data=pd.DataFrame(data=gen_data, columns=['SAT'])
  4. new_x=sm.add_constant(new_data)

  5. predicted_y=results.predict(new_x)

  6. new_x['Predicted_GPA']=predicted_y
  7. #drop const-column
  8. new_x=new_x.drop(['const'], axis=1)
  9. print(new_x)


12-17 09:31