问题描述
我有一个未生成的1D NumPy数组.现在,我们将使用生成的一个.
I have a non-generated 1D NumPy array. For now, we will use a generated one.
import numpy as np
arr1 = np.random.uniform(0, 100, 1_000)
我需要一个与 0.3
相关的数组:
I need an array that will be correlated 0.3
with it:
arr2 = '?'
print(np.corrcoef(arr1, arr2))
Out[1]: 0.3
推荐答案
我已经通过用笨拙的方式解决了这个答案stats.SE到NumPy.想法是随机生成第二个数组 noise
,然后在 arr1
上计算 noise
的最小二乘线性回归的残差.残差与 arr1
的相关性必定为0,当然 arr1
与自身的相关性为1,因此 a * arr1 +的适当线性组合b *残基
将具有任何所需的相关性.
I've adapted this answer by whuber on stats.SE to NumPy. The idea is to generate a second array noise
randomly, and then compute the residuals of a least-squares linear regression of noise
on arr1
. The residuals necessarily have a correlation of 0 with arr1
, and of course arr1
has a correlation of 1 with itself, so an appropriate linear combination of a*arr1 + b*residuals
will have any desired correlation.
import numpy as np
def generate_with_corrcoef(arr1, p):
n = len(arr1)
# generate noise
noise = np.random.uniform(0, 1, n)
# least squares linear regression for noise = m*arr1 + c
m, c = np.linalg.lstsq(np.vstack([arr1, np.ones(n)]).T, noise)[0]
# residuals have 0 correlation with arr1
residuals = noise - (m*arr1 + c)
# the right linear combination a*arr1 + b*residuals
a = p * np.std(residuals)
b = (1 - p**2)**0.5 * np.std(arr1)
arr2 = a*arr1 + b*residuals
# return a scaled/shifted result to have the same mean/sd as arr1
# this doesn't change the correlation coefficient
return np.mean(arr1) + (arr2 - np.mean(arr2)) * np.std(arr1) / np.std(arr2)
最后一行缩放结果,以使平均值和标准偏差与 arr1
相同.但是, arr1
和 arr2
不会完全相同地分布.
The last line scales the result so that the mean and standard deviation are the same as arr1
's. However, arr1
and arr2
will not be identically distributed.
用法:
>>> arr1 = np.random.uniform(0, 100, 1000)
>>> arr2 = generate_with_corrcoef(arr1, 0.3)
>>> np.corrcoef(arr1, arr2)
array([[1. , 0.3],
[0.3, 1. ]])
这篇关于生成与现有1D数组具有预先指定的相关性的NumPy 1D数组?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!