问题描述
在训练卷积神经网络进行图像分类任务时,我们通常希望我们的算法学习将给定图像转换为其正确标签的过滤器(和偏差).我要尝试比较一些模型,包括模型大小,操作数量,准确性等.但是,从tensorflow输出的模型大小,具体是 model.ckpt.data 该文件存储了图中所有变量的值,不是我期望的文件.实际上,它似乎要大三倍.
要直接解决该问题,我要基于此 Jupyter笔记本.以下是定义变量(权重和偏差)的部分:
# Store layers weight & bias
weights = {
# 5x5 conv, 1 input, 32 outputs
'wc1': tf.Variable(tf.random_normal([5, 5, 1, 32]),dtype=tf.float32),
# 5x5 conv, 32 inputs, 64 outputs
'wc2': tf.Variable(tf.random_normal([5, 5, 32, 64]),dtype=tf.float32),
# fully connected, 7*7*64 inputs, 1024 outputs
'wd1': tf.Variable(tf.random_normal([7*7*64, 1024]),dtype=tf.float32),
# 1024 inputs, 10 outputs (class prediction)
'out': tf.Variable(tf.random_normal([1024, num_classes]),dtype=tf.float32)
}
biases = {
'bc1': tf.Variable(tf.random_normal([32]),dtype=tf.float32),
'bc2': tf.Variable(tf.random_normal([64]),dtype=tf.float32),
'bd1': tf.Variable(tf.random_normal([1024]),dtype=tf.float32),
'out': tf.Variable(tf.random_normal([num_classes]),dtype=tf.float32)
}
为了在训练过程结束时保存模型,我添加了几行代码:
# Save the model
save_path = saver.save(sess, logdir+"model.ckpt")
print("Model saved in file: %s" % save_path)
将所有这些变量加起来,我们期望得到一个大小为12.45Mb的 model.ckpt.data 文件(我已经通过计算模型学习的float元素的数量获得了这个文件,然后将该值转换为兆字节).但!保存的 .data 文件为39.3Mb.为什么会这样?
我对更复杂的网络(ResNet的一种变体)采用了相同的方法,并且期望的模型也是如此.数据大小也比实际的 .data 文件小3倍. /p>
所有这些变量的数据类型为float32.
传统上,大多数模型参数位于第一个完全连接的层中,在这种情况下为wd1
.仅计算其大小会得出:
7*7*128 * 1024 * 4 = 25690112
...或25.6Mb
.注意4
系数,因为变量dtype=tf.float32
,即每个参数4
个字节.其他层也影响模型的大小,但不是那么大.
如您所见,您的估算值12.45Mb
有点偏离(您是否为每个参数使用16位?).该检查点还存储一些常规信息,因此开销大约为25%,这仍然很大,但不是300%.
[更新]
所阐明的,所讨论的模型实际上具有形状为[7*7*64, 1024]
的FC1层.因此,计算出的上述大小确实应该为12.5Mb
.这使我可以更仔细地查看保存的检查点.
检查后,我注意到我最初错过的其他大变量:
...
Variable_2 (DT_FLOAT) [3136,1024]
Variable_2/Adam (DT_FLOAT) [3136,1024]
Variable_2/Adam_1 (DT_FLOAT) [3136,1024]
...
Variable_2
恰好是wd1
,但是Adam优化器还有2个副本.这些变量由 Adam优化器,它们被称为 slots ,并且对所有可训练变量保持m
和v
累加器 .现在总大小就很合理了.
您可以运行以下代码来计算图形变量的总大小-37.47Mb
:
var_sizes = [np.product(list(map(int, v.shape))) * v.dtype.size
for v in tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES)]
print(sum(var_sizes) / (1024 ** 2), 'MB')
因此开销实际上很小.额外的大小归功于优化器.
When training convolutional neural networks for image classification tasks we generally want our algorithm to learn the filters (and biases) that transform a given image to its correct label. I have a few models I'm trying to compare in terms of model size, number of operations, accuracy, etc. However, the size of the model outputed from tensorflow, concretely the model.ckpt.data file that stores the values of all the variables in the graph, is not the one I expected. In fact, it seems to be three times bigger.
To go straight to the problem I'm gonna base my question on this Jupyter notebook. Below is the section where the variables (weights and biases) are defined:
# Store layers weight & bias
weights = {
# 5x5 conv, 1 input, 32 outputs
'wc1': tf.Variable(tf.random_normal([5, 5, 1, 32]),dtype=tf.float32),
# 5x5 conv, 32 inputs, 64 outputs
'wc2': tf.Variable(tf.random_normal([5, 5, 32, 64]),dtype=tf.float32),
# fully connected, 7*7*64 inputs, 1024 outputs
'wd1': tf.Variable(tf.random_normal([7*7*64, 1024]),dtype=tf.float32),
# 1024 inputs, 10 outputs (class prediction)
'out': tf.Variable(tf.random_normal([1024, num_classes]),dtype=tf.float32)
}
biases = {
'bc1': tf.Variable(tf.random_normal([32]),dtype=tf.float32),
'bc2': tf.Variable(tf.random_normal([64]),dtype=tf.float32),
'bd1': tf.Variable(tf.random_normal([1024]),dtype=tf.float32),
'out': tf.Variable(tf.random_normal([num_classes]),dtype=tf.float32)
}
I've added a couple of lines in order to save the model at the end of the training process:
# Save the model
save_path = saver.save(sess, logdir+"model.ckpt")
print("Model saved in file: %s" % save_path)
Adding up all those variables we would expect to get a model.ckpt.data file of size 12.45Mb (I've obtained this by just computing the number of float elements that our model learns and then convert that value to MegaBytes). But! the .data file saved is 39.3Mb. Why is this?
I've followed the same approach with a more complex network (a variation of ResNet) and my expected model.data size is also ~3x smaller than what the actual .data file is.
The data type of all these variables is float32.
Traditionally, most of model parameters are in the first fully connected layer, in this case wd1
. Computing only its size yields:
7*7*128 * 1024 * 4 = 25690112
... or 25.6Mb
. Note 4
coefficient, because the variable dtype=tf.float32
, i.e. 4
bytes per parameter. Other layers also contribute to the model size, but not so drastically.
As you can see, your estimate 12.45Mb
is a bit off (did you use 16bit per param?). The checkpoint also stores some general information, hence the overhead around 25%, which is still big, but not 300%.
[Update]
The model in question actually has FC1 layer of shape [7*7*64, 1024]
, as was clarified. So the calculated above size should be 12.5Mb
, indeed. That made me look into the saved checkpoint more carefully.
After inspecting it, I noticed other big variables that I missed originally:
...
Variable_2 (DT_FLOAT) [3136,1024]
Variable_2/Adam (DT_FLOAT) [3136,1024]
Variable_2/Adam_1 (DT_FLOAT) [3136,1024]
...
The Variable_2
is exactly wd1
, but there are 2 more copies for the Adam optimizer. These variables are created by the Adam optimizer, they're called slots and hold the m
and v
accumulators for all trainable variables. Now the total size makes sense.
You can run the following code to compute the total size of the graph variables - 37.47Mb
:
var_sizes = [np.product(list(map(int, v.shape))) * v.dtype.size
for v in tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES)]
print(sum(var_sizes) / (1024 ** 2), 'MB')
So the overhead is actually pretty small. Extra size is due to the optimizer.
这篇关于来自学习变量的预期张量流模型大小的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!