如何正确使用tft.compute_and_apply_vocabulary和tft.tfidf?

本文介绍了如何正确使用tft.compute_and_apply_vocabulary和tft.tfidf?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我尝试在我的jupyter笔记本中使用tft.compute_and_apply_vocabulary和tft.tfidf计算tfidf.但是，我总是收到以下错误:

I try to use tft.compute_and_apply_vocabulary and tft.tfidf to compute tfidf in my jupyter notebook. However I always get the following error:

tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'compute_and_apply_vocabulary/vocabulary/Placeholder' with dtype string
     [[node compute_and_apply_vocabulary/vocabulary/Placeholder (defined at C:\Users\secsi\Anaconda3\envs\tf2\lib\site-packages\tensorflow_

，但占位符类型实际上是字符串.

but the placeholder type is actually string.

这是我的代码:

import tensorflow as tf
import tensorflow_transform as tft

with tf.Session() as sess:
    documents = [
        "a b c d e",
        "f g h i j",
        "k l m n o",
        "p q r s t",
    ]
    documents_tensor = tf.placeholder(tf.string)
    tokens = tf.compat.v1.string_split(documents_tensor)
    compute_vocab = tft.compute_and_apply_vocabulary(tokens, vocab_filename='vocab.txt')

    global_vars_init = tf.global_variables_initializer()
    tabel_init = tf.tables_initializer()


    sess.run([global_vars_init, tabel_init])
    token2ids = sess.run(tfidf, feed_dict={documents_tensor: documents})
    print(f"token2ids: {token2ids}")

版本:

tensorflow:1.14
tensorflow-transform:0.14

提前谢谢！

推荐答案

我们不能像tft.compute_and_apply_vocabulary那样直接使用Tensorflow Transform的操作，与Tensorflow操作不同，后者可以直接在Session中使用

We can't use the Operations of Tensorflow Transform like tft.compute_and_apply_vocabulary directly, unlike Tensorflow Operations, which can be used directly in a Session.

要使用Tensorflow Transform的操作，我们必须在preprocessing_fn中运行它们，然后将其传递给tft_beam.AnalyzeAndTransformDataset.

For us to use the Operations of Tensorflow Transform, we must run them in a preprocessing_fn which should be then passed to tft_beam.AnalyzeAndTransformDataset.

在您的情况下，由于拥有文本数据，因此可以如下所示修改代码:

In your case, as you have Text Data, your code can be modified as shown below:

def preprocessing_fn(inputs):

    """inputs is our dataset"""
    documents = inputs['documents']

    tokens = tf.compat.v1.string_split(documents)
    compute_vocab = tft.compute_and_apply_vocabulary(tokens)
    # Add one for the oov bucket created by compute_and_apply_vocabulary.
    review_bow_indices, review_weight = tft.tfidf(compute_vocab,
                                                  VOCAB_SIZE + 1)
    return {
        REVIEW_KEY: review_bow_indices,
        REVIEW_WEIGHT_KEY: review_weight,
        LABEL_KEY: inputs[LABEL_KEY]
    }

(transformed_train_data, transformed_metadata), transform_fn = 
((train_data, RAW_DATA_METADATA) | 'AnalyzeAndTransform' >>
tft_beam.AnalyzeAndTransformDataset(preprocessing_fn))

您可以在以下示例中引用此链接如何在文本数据集上使用Tensorflow Transform进行数据预处理(情感分析).

You can refer this Link for an example on how to perform Data Pre-Processing using Tensorflow Transform on a Text Dataset (Sentiment Analysis).

如果您认为此答案有用，请接受此答案和/或对其进行投票.谢谢.

If you feel this answer is useful, kindly accept this answer and/or up vote it. Thanks.

这篇关于如何正确使用tft.compute_and_apply_vocabulary和tft.tfidf?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！