我的代码工作正常
df_amazon = pd.read_csv ("datasets/amazon_alexa.tsv", sep="\t")
X = df_amazon['variation'] # the features we want to analyze
ylabels = df_amazon['feedback'] # the labels, or answers, we want to test against
X_train, X_test, y_train, y_test = train_test_split(X, ylabels, test_size=0.3)
# Create pipeline using Bag of Words
pipe = Pipeline([('cleaner', predictors()),
('vectorizer', bow_vector),
('classifier', classifier)])
pipe.fit(X_train,y_train)
但是,如果我尝试向模型添加1个功能,则替换
X = df_amazon['variation']
通过
X = df_amazon[['variation','verified_reviews']]
呼叫
fit
时,我收到来自Sklearn的错误消息:ValueError:找到输入样本数量不一致的输入变量:[2,2205]
因此,当
fit
和X_train
具有形状时,y_train
可以工作(2205,)和(2205,)。
但是当形状更改为
(2205,2)和(2205,)。
最好的解决方法是什么?
最佳答案
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
df = pd.DataFrame(data = [['Heather Gray Fabric','I received the echo as a gift.',1],['Sandstone Fabric','Without having a cellphone, I cannot use many of her features',0]], columns = ['variation','review','feedback'])
vect = CountVectorizer()
vect.fit_transform(df[['variation','review']])
# now when you look at vocab that has been created
print(vect.vocabulary_)
#o/p, where feature has been generated only for column name and not content of particular column
Out[49]:
{'variation': 1, 'review': 0}
#so you need to make one column which contain which contain variation and review both and that need to be passed into your model
df['variation_review'] = df['variation'] + df['review']
vect.fit_transform(df['variation_review'])
print(vect.vocabulary_)
{'heather': 8,
'gray': 6,
'fabrici': 3,
'received': 9,
'the': 11,
'echo': 2,
'as': 0,
'gift': 5,
'sandstone': 10,
'fabricwithout': 4,
'having': 7,
'cellphone': 1}
关于python - 当我尝试为scikit-learn模型增加1个功能时,出现此错误“ValueError:找到的输入变量样本数量不一致”,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/56817875/