问题描述
我试图优化实时3D建模的应用程序。该应用程序的计算部分运行几乎完全在CUDA的GPU。该应用程序需要一个小的(6×6)的解决方案,双precision对称正定线性方程Ax每秒= B 500+次。目前,这是正在做使用Cholesky基于线性代数库中的效率的CPU,但就必须的数据从CPU复制 - GPU和回GPU的每秒数百和内核的开销时间启动每个时间等
I'm attempting to optimise an application in realtime 3D modelling. The compute part of the application runs almost entirely on the GPU in CUDA. The application requires the solution of a small (6x6) double precision symmetric positive definite linear system Ax = b 500+ times per second. Currently this is being done with an efficient CPU based Linear Algebra library using Cholesky but necessitates the copying of data from the CPU - GPU and back to GPU hundreds of times per second and the overhead of kernel launches each time etc.
我如何计算解决方案的GPU线性系统完全无需考虑数据在CPU呢?我读过一些关于岩浆库,但它似乎用混合算法,而不是只GPU算法。
How can I calculate the solution to the linear system on the GPU solely without having to take the data onto the CPU at all? I've read a little about the MAGMA library but it seems to use hybrid algorithms rather than GPU only algorithms.
我ppared一个事实,即一个人的线性系统对GPU的解决方案将是比现有的基于CPU的库慢了许多$ P $,但我想看看是否可以弥补通过移除主机和设备和内核启动等每秒几百次的开销之间的数据通信。如果没有GPU只LAPACK般的替代在那里我将如何去实现的东西来解决这个特殊的6x6的情况下,仅在GPU?难道不与GPU的BLAS库大量的时间投入,例如做了什么?
I'm prepared for the fact that the solution of an individual linear system on the GPU is going to be a lot slower than with the existing CPU based library but I want to see if that can be made up for by removing the data communication between the host and device and the overhead of kernel launches etc hundreds of times per second. If there is no GPU only LAPACK-like alternative out there how would I go about implementing something to solve this particular 6x6 case on the GPU only? Could it be done without a huge time investment with GPU BLAS libraries for example?
推荐答案
NVIDIA发布$ C $下一个批处理Ax = b的解算器,以注册开发者网站去年秋天。这code适用于通用矩阵,并应努力不够好,只要你能扩展对称矩阵到全矩阵您的需求(这不应该是一个问题,一个6×6?)。由于code进行旋转,这是不必要的正定矩阵,它是不是最适合你的情况,但你可以修改它为你的目的,而code是一个BSD许可下。
NVIDIA posted code for a batched Ax=b solver to the registered developer website last fall. This code works for generic matrices, and should work well enough for your needs provided you can expand the symmetric matrices to full matrices (that should not be an issue for a 6x6?). As the code performs pivoting, which is unnecessary for positive definite matrices, it is not optimal for your case, but you may be able to modify it for your purposes as the code is under a BSD license.
NVIDIA的标准开发者网站上遇到一些问题的时刻。这里是你可以在这个时候下载成批求解code:
NVIDIA's standard developer website is experiencing some issues at the moment. Here is how you can download the batched solver code at this time:
(1)进入 http://www.nvidia.com/content /cuda/cuda-toolkit.html
(2)如果你有一个现有的NVdeveloper帐户(例如,通过partners.nvidia.com)点击绿色的登录nvdeveloper链接在屏幕的右半部分。否则,点击加入nvdeveloper申请一个新的帐户;新账户的请求通常认可在一个工作日内。
(2) If you have an existing NVdeveloper account (e.g. via partners.nvidia.com) click on the green "Login to nvdeveloper" link on the right half of the screen. Otherwise click on "Join nvdeveloper" to apply for a new account; requests for new accounts are typically approved within one business day.
(3)登录在提示符下您的电子邮件地址和密码
(3) Log in at the prompt with your email address and password
(4)有在右手侧标题为最新的下载的一节。从上面的第五项是成批的解算器。点击这一点,它会带您进入下载页面code。
(4) There is a section on the right hand side titled "Newest Downloads". The fifth item from the top is "Batched Solver". Click on that and it will bring you to the download page for the code.
(5)点击下载链接,然后点击接受,以接受许可条款。您的下载应该开始。
(5) Click on the "download" link, then click "Accept" to accept the license terms. Your download should start.
这篇关于解决小对称正定Ax = b的GPU上只的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!