NumPy中具有很大矩阵的线性回归-如何节省内存?

本文介绍了NumPy中具有很大矩阵的线性回归-如何节省内存?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

所以我有这些巨大的矩阵X和Y.X和Y都有1亿行，而X有10列.我正在尝试使用这些矩阵实现线性回归，并且我需要数量(X^T*X)^-1 * X^T * Y.我该如何尽可能节省空间?

So I have these ginormous matrices X and Y. X and Y both have 100 million rows, and X has 10 columns. I'm trying to implement linear regression with these matrices, and I need the quantity (X^T*X)^-1 * X^T * Y. How can I compute this as space-efficiently as possible?

现在我有

X = readMatrix("fileX.txt")
Y = readMatrix("fileY.txt")
return (X.getT() * X).getI() * X.getT() * Y

这里有多少矩阵存储在内存中?是否一次存储两个以上的矩阵?有更好的方法吗?

How many matrices are being stored in memory here? Are more than two matrices being stored at once? Is there a better way to do it?

我有大约1.5 GB的内存用于该项目.如果我关闭其他所有程序，则可以将其拉伸到2或2.5.理想情况下，该过程也可以在很短的时间内运行，但是内存限制更为严格.

I have about 1.5 GB of memory for this project. I can probably stretch it to 2 or 2.5 if I close every other program. Ideally the process would run in a short amount of time also, but the memory bound is more strict.

我尝试过的另一种方法是将计算的中间步骤另存为文本文件，并在每一步之后重新加载它们.但这很慢.

The other approach I've tried is saving the intermediate steps of the calculation as text files and reloading them after every step. But that is very slow.

NumPy中具有很大矩阵的线性回归

问题描述

推荐答案