本文介绍了如何从张量流中的 tfprof 计算触发器?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我如何从 tfprof 获得 flops 的数量,我的代码如下:

how can i get the number of flops from tfprof i have the code as:

def calculate_flops():
    # Print to stdout an analysis of the number of floating point operations in the
    # model broken down by individual operations.
    param_stats = tf.contrib.tfprof.model_analyzer.print_model_analysis(
    tf.get_default_graph(),
    tfprof_options=tf.contrib.tfprof.model_analyzer.
    TRAINABLE_VARS_PARAMS_STAT_OPTIONS)
    print(param_stats)

但结果显示flops = 0.我如何计算触发器的数量.我可以举个例子吗?

but the results says flops = 0.how can i calculate the number of flops. can i have an example ?

推荐答案

首先,截至目前,tfprof.model_analyzer.print_model_analysis 已被弃用,tf.profiler.profilecode> 应根据官方文档使用.

First of all, as of now, tfprof.model_analyzer.print_model_analysis is deprecated and tf.profiler.profile should be used instead according to the official documentation.

既然我们知道了FLOP的次数,我们就可以通过测量前向传播的运行时间并除以FLOP/run_time

Given that we know the number of FLOP, we can get the FLOPS (FLOP per second) of a forward pass by measuring run time of a forward pass and divide FLOP/run_time

举一个简单的例子.

g = tf.Graph()
sess = tf.Session(graph=g)
with g.as_default():
    A = tf.Variable(initial_value=tf.random_normal([25, 16]))
    B = tf.Variable(initial_value=tf.random_normal([16, 9]))
    C = tf.matmul(A,B, name='output')
    sess.run(tf.global_variables_initializer())
    flops = tf.profiler.profile(g, options=tf.profiler.ProfileOptionBuilder.float_operation())
    print('FLOP = ', flops.total_float_ops)

输出8288.但是为什么我们得到 8288 而不是 expected 结果 7200=2*25*16*9 ?答案在于张量 AB 的初始化方式.用高斯分布初始化需要一些 FLOP.通过

outputs 8288. But why do we get 8288 instead of the expected result 7200=2*25*16*9 ? The answer is in the way the tensors A and B are initialised. Initialising with a Gaussian distribution costs some FLOP. Changing the definition of A and B by

    A = tf.Variable(initial_value=tf.zeros([25, 16]))
    B = tf.Variable(initial_value=tf.zeros([16, 9]))

给出预期的输出 7200.

通常,网络的变量在其他方案中使用高斯分布进行初始化.大多数时候,我们对初始化 FLOP 不感兴趣,因为它们在初始化期间完成一次,并且不会在训练或推理期间发生.那么,如何在不考虑初始化 FLOP 的情况下获得确切的 FLOP 数量?

Usually, a network's variables are initialised with Gaussian distributions among other schemes. Most of the time, we are not interested by the initialisation FLOP as they are done once during initialisation and do not happen during the training nor the inference. So, how could one get the exact number of FLOP disregarding the initialisation FLOP?

使用 pb 冻结图形.

Freeze the graph with a pb.

以下代码段说明了这一点:

The following snippet illustrates this:

import tensorflow as tf
from tensorflow.python.framework import graph_util

def load_pb(pb):
    with tf.gfile.GFile(pb, "rb") as f:
        graph_def = tf.GraphDef()
        graph_def.ParseFromString(f.read())
    with tf.Graph().as_default() as graph:
        tf.import_graph_def(graph_def, name='')
        return graph

# ***** (1) Create Graph *****
g = tf.Graph()
sess = tf.Session(graph=g)
with g.as_default():
    A = tf.Variable(initial_value=tf.random_normal([25, 16]))
    B = tf.Variable(initial_value=tf.random_normal([16, 9]))
    C = tf.matmul(A, B, name='output')
    sess.run(tf.global_variables_initializer())
    flops = tf.profiler.profile(g, options = tf.profiler.ProfileOptionBuilder.float_operation())
    print('FLOP before freezing', flops.total_float_ops)
# *****************************

# ***** (2) freeze graph *****
output_graph_def = graph_util.convert_variables_to_constants(sess, g.as_graph_def(), ['output'])

with tf.gfile.GFile('graph.pb', "wb") as f:
    f.write(output_graph_def.SerializeToString())
# *****************************


# ***** (3) Load frozen graph *****
g2 = load_pb('./graph.pb')
with g2.as_default():
    flops = tf.profiler.profile(g2, options = tf.profiler.ProfileOptionBuilder.float_operation())
    print('FLOP after freezing', flops.total_float_ops)

输出

FLOP before freezing 8288
FLOP after freezing 7200

通常矩阵乘法的 FLOP 是 mq(2p -1) 对于乘积 AB 其中 A[m, p]B[p, q] 但 TensorFlow 由于某种原因返回 2mpq.一个问题已打开以了解原因.


Usually the FLOP of a matrix multiplication are mq(2p -1) for the product AB where A[m, p] and B[p, q] but TensorFlow returns 2mpq for some reason. An issue has been opened to understand why.

这篇关于如何从张量流中的 tfprof 计算触发器?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-03 10:06