本文介绍了为什么图像张量的形状是 (?, ?, ?)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我的代码,

img_path = tf.read_file(testqueue[0])
my_img = tf.image.decode_jpeg(img_path)
sess.run(my_img)
print my_img.get_shape()

结果是,

(?, ?, ?)

为什么我会得到这个结果?

Why did I get this result?

推荐答案

回答这个问题并提供一些细节.

To answer this question and provide some details.

tensor_name.shape 返回图形编译时可用的形状信息.它依赖于张量属性.

tensor_name.shape returns the shape information available at graph compilation time. It relies on the tensor-properties.

tf.decode_jpeg 已注册.">这里InferenceContext 下运行形状传播.给定从输入张量已知的形状属性,每个操作都提供了其输出张量的外观的提示.

tf.decode_jpeg is registered here. During creating the graph, TensorFlow runs a shape propagation under the InferenceContext. Given the shape-properties known from the input tensors, each operation provides hints how its output tensors will look like.

例如,rgb2gray"操作只会复制输入张量的形状(比如 [b',h',w',c'] 并将输出设置为 [b',h',w',1]. 它不需要知道 b', h', w' 的确切值,因为它可以复制这些先前的值.

For example, the "rgb2gray" operation would just copy the shape of the input tensor (say [b',h',w',c'] and set the output to [b',h',w',1]. It does not need to know the exact values for b', h', w', as it can just copy these previous values.

看具体的对于tf.decode_jpeg,这个操作显然可以处理一个channels属性:

// read the attribute "channels from "tf.image.decode_jpeg(..., channels)"
TF_RETURN_IF_ERROR(c->GetAttr("channels", &channels));
// ....
// set the tensor information "my_img.get_shape()" will have
c->set_output(0, c->MakeShape({InferenceContext::kUnknownDim,
                                 InferenceContext::kUnknownDim, channels_dim})); 

前两个维度设置为InferenceContext::kUnknownDim,因为操作只知道有一个高度和宽度,但具体值可以变化.它可以最好地猜测通道轴的样子.如果您指定属性 tf.decode_jpeg(..., channels=3) 它可以并且将设置最后一个

The first two dimension are set to InferenceContext::kUnknownDim as the operation only knows there is a height and width, but the specific values can be varying. It makes a best guess how the channel axis looks like. If you specify the attribute tf.decode_jpeg(..., channels=3) it can and will set the last

这会产生一个形状 (?, ?, ?),因为 if-branch channels ==0 变得活跃 此处.

This results to a shape (?, ?, ?), as the if-branch channels ==0 gets active here.

另一方面,tf.shape定义 这里 结束here.这会检查实际张量-内容 这里:

// get actual tensor-shape from the value itself
TensorShape shape;
OP_REQUIRES_OK(ctx, shape_op_helpers::GetRegularOrVariantShape(ctx, 0, &shape));
const int rank = shape.dims();
// write the tensor result from "tf.shape(...)"
for (int i = 0; i < rank; ++i) {
  int64 dim_size = shape.dim_size(i);
  // ...
  vec(i) = static_cast<OutType>(dim_size); // the actual size for dim "i"
}

这就像 tf.shape 对它之前的操作说的:

It is like tf.shape is saying to its previous operation:

你可以告诉我几分钟前你得出的任何结论.我不在乎你在这一点上有多聪明,或者你对形状的猜测付出了多少努力.看,我只是看看现在有内容的具体张量,我就完成了.

后果

这有一些重要的后果:

consequences

This has some important consequences:

  • tf.shape 是张量,而 tensorname.shape 不是
  • 某些属性需要一个整数.因此没有办法使用张量 tf.shape
  • 图形优化(如 XLA)只能依赖于 tensorname.shape
  • 中给出的信息
  • 如果您知道图像的形状(只有 128x128x3 图像的数据库),您应该设置形状,例如,使用 tf.reshape(img, [128, 128, 3]
  • tf.shape is a tensor, while tensorname.shape is not
  • some attributes require an integer. Hence there is no way of using the tensor tf.shape
  • Graph-Optimization (like XLA) can only rely on information given in tensorname.shape
  • If you know the shape of the image (having a database of only 128x128x3 images), you should set the shape, e.g., using tf.reshape(img, [128, 128, 3]

您可能也对 tf.image.extract_jpeg_shape 实现了.

这篇关于为什么图像张量的形状是 (?, ?, ?)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-12 02:42