本文介绍了多类文本分类类型错误:输入必须是Sparse张量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试构建一个深度学习模型来进行文本分类。但是,当我运行下面的脚本时,我遇到此错误。

InvalidArgumentError: indices[2] = [0,398] is out of order. Many sparse ops require sorted indices. Use `tf.sparse.reorder` to create a correctly ordered copy.

但是,当我尝试使用tf. sparse. reorder时,我遇到此错误,其中显示为TypeError: Input must be a SparseTensor.

这些是输入的维度

X_train_cv1.shape, y_train.shape, X_validation_cv1.shape, y_validation.shape
((13435, 675), (13435, 3), (3359, 675), (3359, 3))

有什么方法可以纠正这个问题吗?

# Split the data into training and test sets
from sklearn.model_selection import train_test_split
X_train, X_validation, y_train, y_validation = train_test_split(X, y, test_size=0.2, random_state=42)

# encode class values as integers
encoder = LabelEncoder()
encoder.fit(y_train)
encoded_y_train = encoder.transform(y_train)
# convert integers to dummy variables (i.e. one hot encoded)
y_train= np_utils.to_categorical(encoded_y_train)

# encode class values as integers
encoder = LabelEncoder()
encoder.fit(y_validation)
encoded_y_validation = encoder.transform(y_validation)
# convert integers to dummy variables (i.e. one hot encoded)
y_validation= np_utils.to_categorical(encoded_y_validation)

# The first document-term matrix has default Count Vectorizer values - counts of bigrams
from sklearn.feature_extraction.text import CountVectorizer

cv1 = CountVectorizer(analyzer='char',ngram_range=(2, 2))

X_train_cv1 = cv1.fit_transform(X_train)
X_validation_cv1  = cv1.transform(X_validation)

input_dim = X_train_cv1.shape[1]  # Number of features
model = Sequential()
model.add(layers.Dense(10, input_dim=input_dim, activation='relu'))
model.add(layers.Dense(3, activation='softmax'))

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()

X_train_cv1 = tf.sparse.reorder(X_train_cv1)
y_train = tf.sparse.reorder(y_train)
X_validation_cv1 = tf.sparse.reorder(X_validation_cv1)
y_validation = tf.sparse.reorder(y_validation)

history = model.fit(X_train_cv1, y_train,epochs=100,verbose=True,validation_data=(X_validation_cv1, y_validation),batch_size=10)

这是我的数据集

推荐答案

好的,我设法找到了答案。显然,Kera不能很好地处理稀疏数组,因此我只需在代码行中包含此编辑即可使其成为数组。

X_train_cv1 = cv1.fit_transform(X_train).toarray()
X_validation_cv1  = cv1.transform(X_validation).toarray()

这篇关于多类文本分类类型错误:输入必须是Sparse张量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-26 23:22