什么是 tensorflow.python.data.ops.dataset_ops._OptionsDataset?

本文介绍了什么是 tensorflow.python.data.ops.dataset_ops._OptionsDataset?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用来自 tensorflow 的 Transformer 代码 - https://www.tensorflow.org/beta/tutorials/text/transformer

I am using the Transformer code from tensorflow - https://www.tensorflow.org/beta/tutorials/text/transformer

在这段代码中，使用的数据集是这样加载的 -

In this code, the dataset used is loaded like this -

examples, metadata = tfds.load('ted_hrlr_translate/pt_to_en', with_info=True,
                               as_supervised=True)
train_examples, val_examples = examples['train'], examples['validation']

当我使用 :

type(train_examples)

我得到以下作为输出 -

I get the following as output -

tensorflow.python.data.ops.dataset_ops._OptionsDataset

现在我只想更改数据集的一些条目，即句子，但我无法理解，因为我不理解类型.

Now I just wanted to change some entries of the dataset that is the sentences, but I am not able to as I don't understand the type.

我可以使用 :

for data in train_examples:
    print(data,type(data))

数据类型是 -

<class 'tuple'>

最后我想要的是用我自己的数据替换这些元组中的一些.有人能告诉我怎么做或给我一些关于这种类型的细节吗tensorflow.python.data.ops.dataset_ops._OptionsDataset.

Finally what I want is to replace some of these tuples with my own data.Can someone tell me how to do this or give me some details about this typetensorflow.python.data.ops.dataset_ops._OptionsDataset.

推荐答案

tensorflow.python.data.ops.dataset_ops._OptionsDataset 只是扩展基类 tf.compat 的另一个类.v2.data.Dataset (DatasetV2) 包含 tf.data.Options 和原始的 tf.compat.v2.data.Dataset 数据集(葡萄牙语- 在你的情况下是英文元组).

tensorflow.python.data.ops.dataset_ops._OptionsDataset is just another class extending the base class tf.compat.v2.data.Dataset (DatasetV2) which holds tf.data.Options along with the original tf.compat.v2.data.Dataset dataset (The Portuguese-English tuples in your case).

(tf.data.Options 在您对数据集 tf.data.Dataset.map 或 tf.data.Dataset 使用流函数时运行.交错)

(tf.data.Options operates when you are using streaming functions over your dataset tf.data.Dataset.map or tf.data.Dataset.interleave)

如何查看单个元素?

我确信有很多方法，但一种直接的方法是在基类中使用迭代器:

I'm sure there are many ways, but one straight way would be to use the iterator in the base class:

由于 examples['train'] 是 _OptionsDataset 的一种类型，这里通过调用一个方法进行迭代tf.compat.v2.data.Dataset

Since examples['train'] is a type of _OptionsDataset here is iterating by calling a method fromtf.compat.v2.data.Dataset

iterator = examples['train'].__iter__()
next_element = iterator.get_next()
pt = next_element[0]
en = next_element[1]
print(pt.numpy())
print(en.numpy())

输出如下:

b'o problema \xc3\xa9 que nunca vivi l\xc3\xa1 um \xc3\xbanico dia .'
b"except , i 've never lived one day of my life there ."

用您自己的数据替换:

由于您没有提到要用什么来替换原始数据集，我假设您有自己特定翻译的 CSV/TSV 文件.然后通过调用 CSV API 将您的 CSV 文件读入数据集来创建一个单独的 tf.compat.v2.data.Dataset 对象本身应该很有用:

Since you've not mentioned what you want to substitute the original dataset with, I'll assume you have a CSV/TSV file of your own specific translations. Then it should be useful to create a separate tf.compat.v2.data.Dataset object itself by calling the CSV API to read your CSV file into a dataset:

tf.data.experimental.make_csv_dataset

https://colab.research.google.com/github/tensorflow/docs/blob/master/site/en/r2/tutorials/load_data/csv.ipynb

这篇关于什么是 tensorflow.python.data.ops.dataset_ops._OptionsDataset?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！