问题描述
由于许多机器学习算法都依赖矩阵乘法(或者至少可以使用矩阵乘法来实现)来测试我的GPU,因为我计划创建矩阵a,b,将它们相乘并记录完成计算所需的时间. /p>
这里的代码将生成两个维度为300000,20000的矩阵并将它们相乘:
import tensorflow as tf
import numpy as np
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
#a = np.array([[1, 2, 3], [4, 5, 6]])
#b = np.array([1, 2, 3])
a = np.random.rand(300000,20000)
b = np.random.rand(300000,20000)
println("Init complete");
result = tf.mul(a , b)
v = sess.run(result)
print(v)
这是比较GPU性能的充分测试吗?我还应该考虑哪些其他因素?
这是一个 matmul基准的示例,它避免了常见的陷阱,并与Titan X Pascal上的官方11 TFLOP标记相符.
import os
import sys
os.environ["CUDA_VISIBLE_DEVICES"]="1"
import tensorflow as tf
import time
n = 8192
dtype = tf.float32
with tf.device("/gpu:0"):
matrix1 = tf.Variable(tf.ones((n, n), dtype=dtype))
matrix2 = tf.Variable(tf.ones((n, n), dtype=dtype))
product = tf.matmul(matrix1, matrix2)
# avoid optimizing away redundant nodes
config = tf.ConfigProto(graph_options=tf.GraphOptions(optimizer_options=tf.OptimizerOptions(opt_level=tf.OptimizerOptions.L0)))
sess = tf.Session(config=config)
sess.run(tf.global_variables_initializer())
iters = 10
# pre-warming
sess.run(product.op)
start = time.time()
for i in range(iters):
sess.run(product.op)
end = time.time()
ops = n**3 + (n-1)*n**2 # n^2*(n-1) additions, n^3 multiplications
elapsed = (end - start)
rate = iters*ops/elapsed/10**9
print('\n %d x %d matmul took: %.2f sec, %.2f G ops/sec' % (n, n,
elapsed/iters,
rate,))
As many machine learning algorithms rely to matrix multiplication(or at least can be implemented using matrix multiplication) to test my GPU is I plan to create matrices a , b , multiply them and record time it takes for computation to complete.
Here is code that will generate two matrices of dimensions 300000,20000 and multiply them :
import tensorflow as tf
import numpy as np
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
#a = np.array([[1, 2, 3], [4, 5, 6]])
#b = np.array([1, 2, 3])
a = np.random.rand(300000,20000)
b = np.random.rand(300000,20000)
println("Init complete");
result = tf.mul(a , b)
v = sess.run(result)
print(v)
Is this a sufficient test to compare performance of GPU's ? What other factors should I consider ?
Here's an example of a matmul benchmark which avoids common pitfalls, and matches the official 11 TFLOP mark on Titan X Pascal.
import os
import sys
os.environ["CUDA_VISIBLE_DEVICES"]="1"
import tensorflow as tf
import time
n = 8192
dtype = tf.float32
with tf.device("/gpu:0"):
matrix1 = tf.Variable(tf.ones((n, n), dtype=dtype))
matrix2 = tf.Variable(tf.ones((n, n), dtype=dtype))
product = tf.matmul(matrix1, matrix2)
# avoid optimizing away redundant nodes
config = tf.ConfigProto(graph_options=tf.GraphOptions(optimizer_options=tf.OptimizerOptions(opt_level=tf.OptimizerOptions.L0)))
sess = tf.Session(config=config)
sess.run(tf.global_variables_initializer())
iters = 10
# pre-warming
sess.run(product.op)
start = time.time()
for i in range(iters):
sess.run(product.op)
end = time.time()
ops = n**3 + (n-1)*n**2 # n^2*(n-1) additions, n^3 multiplications
elapsed = (end - start)
rate = iters*ops/elapsed/10**9
print('\n %d x %d matmul took: %.2f sec, %.2f G ops/sec' % (n, n,
elapsed/iters,
rate,))
这篇关于使用Tensorflow矩阵乘法测试GPU的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!