问题描述
这到底会做什么?
dataset = tf.data.Dataset.from_tensor_slices([0, 0, 0, 1, 1, 1, 2, 2, 2])
dataset.shuffle(buffer_size=5).repeat().batch(3)
我注意到几个相关的问题,但没有一个问题能完全回答我的担忧。我对 shuffle(buffer_size)
的操作感到困惑。我知道它将在内存中使用5个第一个示例 [0,0,0,1,1,]
,但是接下来如何处理此缓冲区?以及该缓冲区如何与 repeat()
和 batch()
交互?
I've noticed several related questions but none of them answered exactly my concern. I'm confused with what shuffle(buffer_size)
is doing. I understand it will take 5 first examples [0, 0, 0, 1, 1]
into memory, but what will it do next with this buffer? And how does this buffer interact with repeat()
and batch()
?
推荐答案
shuffle的工作方式很复杂,但是您可以通过先填充一个大小为buffer_size的缓冲区,然后每次请求一个元素时采样来假装它
The way shuffle works is complicated, but you can pretend it works by first filling a buffer of size buffer_size and then, every time you ask for an element, sampling a uniformly random position in that buffer and replacing that with a fresh element.
在改组之前分批处理意味着您将对预制的迷你批进行改组(因此迷你批本身不会改变) ,只是它们的顺序),在改组后进行批处理时,您可以随机更改批处理的内容。同样,在改组前重复意味着您将对无限的流示例进行改组(因此第二个纪元与第一个纪元将具有不同的顺序),而在改组之后重复意味着您将始终在每个纪元中看到相同的示例。
Batching before shuffling means you'll shuffle pre-made minibatches (so the minibatches themselves won't change, just their order) while batching after shuffling lets you change the contents of the batches themselves randomly. Similarly, repeat before shuffling means you will shuffle an infinite stream examples (so the second epoch will have a different order than the first epoch) while repeating after shuffling means you'll always see the same examples in each epoch.
这篇关于当与repeat()和batch()一起使用时,TensorFlow Dataset.shuffle()行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!