问题描述
我正在尝试分析TensorFlow的计算/内存使用情况,发现 tfprof 是适合我的工具.但是,我无法获得所有运营商的FLOPS.
I am trying to profile computation/memory usage of TensorFlow and found that tfprof is a right tool for my purpose. However, I was not able to get FLOPS of all operators.
这是我在TensorFlow信息库(tensorflow/models/image/cifar10/cifar10_train.py)中使用cifar10教程进行tfprof教程后所做的事情:
Here is what I did following the tfprof tutorial using cifar10 tutorial in TensorFlow repository (tensorflow/models/image/cifar10/cifar10_train.py):
run_metadata = tf.RunMetadata()
_, loss_value = sess.run([train_op, loss],
options=tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE),
run_metadata=run_metadata)
op_log = tfprof_log_pb2.OpLog()
// TODO: add op information
tf.contrib.tfprof.tfprof_logger.write_op_log(
tf.get_default_graph(),
log_dir="/tmp/log_dir",
op_log=op_log,
run_meta=run_metadata)
tf.contrib.tfprof.model_analyzer.print_model_analysis(
tf.get_default_graph(),
run_metadata=run_metadata,
op_log=op_log,
tfprof_options=tf.contrib.tfprof.model_analyzer.FLOAT_OPS_OPTIONS)
结果是
Parsing GraphDef...
Parsing RunMetadata...
Parsing OpLog...
Preparing Views...
=========================Options=============================
-max_depth 10000
-min_bytes 0
-min_micros 0
-min_params 0
-min_float_ops 1
-device_regexes .*
-order_by float_ops
-account_type_regexes .*
-start_name_regexes .*
-trim_name_regexes
-show_name_regexes .*
-hide_name_regexes
-account_displayed_op_only true
-select float_ops
-viz false
-dump_to_file
==================Model Analysis Report======================
_TFProfRoot (0/5.23b flops)
conv2/Conv2D (3.77b/3.77b flops)
conv1/Conv2D (707.79m/707.79m flops)
gradients/local3/MatMul_grad/MatMul (226.49m/226.49m flops)
gradients/local3/MatMul_grad/MatMul_1 (226.49m/226.49m flops)
local3/MatMul (226.49m/226.49m flops)
gradients/local4/MatMul_grad/MatMul (18.87m/18.87m flops)
gradients/local4/MatMul_grad/MatMul_1 (18.87m/18.87m flops)
local4/MatMul (18.87m/18.87m flops)
conv1/BiasAdd (4.72m/4.72m flops)
conv2/BiasAdd (1.18m/1.18m flops)
gradients/softmax_linear/MatMul_grad/MatMul (491.52k/491.52k flops)
gradients/softmax_linear/MatMul_grad/MatMul_1 (491.52k/491.52k flops)
softmax_linear/MatMul (491.52k/491.52k flops)
======================End of Report==========================
但是,结果并不包含所有操作,例如最大池化,relu,转换层的渐变.可能未定义这些操作的触发器统计信息(RegisterStatistics('flops')).因此,为了提供运行时信息(如tfprof教程11),我尝试创建OpLog
(请参见上面的代码).
However, the result does not contain all of the ops such as max pooling, relu, gradient of conv layers. Maybe flops stats of those ops are not defined (RegisterStatistics('flops')). Therefore, to provide runtime information, as in the tfprof tutorial 11), I tried to create OpLog
(See code above).
但是,我不确定如何添加操作信息(如何获取操作的条目名称?).有什么方法可以添加其中包含的 ALL 操作吗?
However, I am not sure how can I add op information (How can I get entry name of the ops?). Is there any way to add ALL ops it contains?
还是tfprof以外的任何其他工具?也许是NVIDIA提供的分析工具?
Or any other tool rather than tfprof? Perhaps profiling tool from NVIDIA?
推荐答案
您是正确的,其他操作在没有RegisterStatistics('flops')之前也没有触发器.欢迎您贡献力量.
You are right that the other ops don't have flops before they don't have RegisterStatistics('flops'). You are welcome to contribute.
我不确定NVIDA是否提供相应的工具.
I'm not sure if NVIDA has tools for it.
这篇关于使用tfprof分析TensorFlow的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!