问题描述
我的目标是实时运行TensorFlow模型,以从学习的模型控制车辆.我们的车辆系统使用与OpenCV紧密相关的ROS(机器人操作系统).因此,我收到了一个包含ROS感兴趣的图像的OpenCV Mat.
My goal is to run a TensorFlow model in real time to control a vehicle from a learned model. Our vehicle system uses ROS (Robot Operating System) which is tied closely to OpenCV. So, I receive an OpenCV Mat containing the image of interest from ROS.
cv::Mat cameraImg;
我想直接从此OpenCV矩阵中的数据创建一个Tensorflow张量,以避免逐行复制矩阵的开销.使用此问题的答案我已经使用以下代码设法使网络向前传播:
I would like to create a Tensorflow Tensor directly from the data in this OpenCV matrix to avoid the expense of copying the matrix line-by-line. Using the answer to This Question I have managed to get the forward pass of the network working with the following code:
cameraImg.convertTo(cameraImg, CV_32FC3);
Tensor inputImg(DT_FLOAT, TensorShape({1,inputheight,inputwidth,3}));
auto inputImageMapped = inputImg.tensor<float, 4>();
auto start = std::chrono::system_clock::now();
//Copy all the data over
for (int y = 0; y < inputheight; ++y) {
const float* source_row = ((float*)cameraImg.data) + (y * inputwidth * 3);
for (int x = 0; x < inputwidth; ++x) {
const float* source_pixel = source_row + (x * 3);
inputImageMapped(0, y, x, 0) = source_pixel[2];
inputImageMapped(0, y, x, 1) = source_pixel[1];
inputImageMapped(0, y, x, 2) = source_pixel[0];
}
}
auto end = std::chrono::system_clock::now();
但是,使用这种方法,复制到张量需要80毫秒至130毫秒,而整个正向传递(对于10层卷积网络)仅需要25毫秒.
However, using this method the copy to the tensor takes between 80ms and 130ms, while the entire forward pass (for a 10-layer convolutional network) only takes 25ms.
查看 tensorflow文档,有一个使用分配器的Tensor构造函数.但是,我找不到任何与此功能相关的Tensorflow或Eigen文档,也没有找到 Eigen Map类与张量有关.
Looking at the tensorflow documentation, it appears there is a Tensor constructor that takes an allocator. However, I have not been able to find any Tensorflow or Eigen documentation relating to this functionality or the Eigen Map class as it relates to Tensors.
有没有人对如何加快代码的速度有任何见解,理想情况下是通过重用我的OpenCV内存来实现的?
Does anyone have any insight into how this code can be sped up, ideally by re-using my OpenCV memory?
我已经成功实现了@mrry的建议,并且可以重用OpenCV分配的内存.我已经打开 github问题8033 ,要求将其添加到tensorflow源树中.我的方法不是很漂亮,但是可以.
I have successfully implemented what @mrry suggested, and can re-use the memory allocated by OpenCV. I have opened github issue 8033 requesting this be added to the tensorflow source tree. My method isn't that pretty, but it works.
编译外部库并将其链接到libtensorflow.so库仍然非常困难. tensorflow cmake库可能会对此有所帮助,我尚未尝试过.
It is still very difficult to compile an external library and link it to the libtensorflow.so library. Potentially the tensorflow cmake library will help with this, I have not yet tried it.
推荐答案
TensorFlow C API(与C ++ API相反)导出 TF_NewTensor()
函数,该函数可用于根据指针和长度创建张量,并将结果对象传递给 TF_Run()
函数.
The TensorFlow C API (as opposed to the C++ API) exports the TF_NewTensor()
function, which allows you to create a tensor from a pointer and a length, and you can pass the resulting object to the TF_Run()
function.
当前,这是用于从预分配的缓冲区创建TensorFlow张量的唯一公共API.没有支持将TF_Tensor*
强制转换为tensorflow::Tensor
的方法,但是如果您查看实现,则可以使用具有friend
访问权限的私有API来执行此操作.如果您对此进行了试验,并且可以显示出明显的加速效果,我们会考虑使用功能请求将此添加到公共API.
Currently this is the only public API for creating a TensorFlow tensor from a pre-allocated buffer. There is no supported way to cast a TF_Tensor*
to a tensorflow::Tensor
but if you look at the implementation there is a private API with friend
access that can do this. If you experiment with this, and can show an appreciable speedup, we'd consider a feature request for adding this to the public API.
这篇关于无需复制即可将OpenCV Mat导入C ++ Tensorflow的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!