问题描述
在 tensorflow 中,我像这样注册一个操作:
In tensorflow I register an op like so:
REGISTER_OP("RimeBSqrt")
.Input("stokes: FT")
.Input("alpha: FT")
.Input("frequency: FT")
.Input("ref_freq: FT")
.Output("b_sqrt: CT")
.Attr("FT: {float, double} = DT_FLOAT")
.Attr("CT: {complex64, complex128} = DT_COMPLEX64");
以上所有输入都是张量,但 ref_freq 是标量或 0-D 张量.在我的 CPU 内核的 Compute() 方法中我可以执行以下操作来提取标量:
All of the above inputs are tensors,but ref_freq is a scalar or 0-D tensor.In the Compute() method of my CPU kernelI can do the following to extract the scalar:
const Tensor & in_ref_freq = context->input(3);
FT ref_freq = in_ref_freq.tensor<FT, 1>()(0);
但是同种代码会产生segfault在我的 GPU 内核的 Compute() 方法中,因为CPU 现在尝试访问内存块GPU设备.反正有没有拦截这个标量在将其发送到 GPU 之前的值?我想避免以下额外的内存间接级别一个 CUDA 内核:
However, the same kind of code generates a segfaultin the Compute() method of my GPU kernel, becausethe CPU now tries to access a block of memory on theGPU device. Is there anyway to intercept this scalarvalue before sending it into the GPU? I'd like to avoidthe following extra level of memory indirection ina CUDA kernel:
template <typename FT>
__global__ void kernel(..., FT * ref_freq, ...)
{
FT value = ref_freq[0];
}
我不认为 Attr
是用于 ref_freq
的方法,因为它是可变的、可配置的值.
I don't think Attr
is the approach to use for ref_freq
since it is changeable, configurable value.
- CPU Tensorflow 内核代码为 Lhere_ops/b_sqrt_op_cpu"">Lhere.
- GPU Tensorflow 内核代码为 Lreel="L".
- Python 变量设置代码是 这里
推荐答案
您可以指定 TensorFlow OpKernel
的一个或多个输入(或输出)在主机内存"中,它允许您访问 Compute()
方法中的值.为此,您需要修改 REGISTER_KERNEL_BUILDER()
调用以添加 .HostMemory("ref_freq")
指令:
You can specify that one or more of the inputs to (or outputs from) a TensorFlow OpKernel
are in "host memory", which allows you to access the value in the Compute()
method. To do this you would modify your REGISTER_KERNEL_BUILDER()
call to add a .HostMemory("ref_freq")
instruction:
REGISTER_KERNEL_BUILDER(
Name("RimeBSqrt")
.Device(tensorflow::DEVICE_GPU)
.TypeConstraint<float>("FT")
.TypeConstraint<tensorflow::complex64>("CT")
.HostMemory("ref_freq"),
RimeBSqrt<tensorflow::GPUDevice, float, tensorflow::complex64>);
这篇关于在 tensorflow 中,如何在将标量张量值移动到 GPU 之前访问它?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!