计算二维联合概率分布

要获得CDF，我相信您可以将cumsum应用于PDF的每个轴.I have many points inside a square. I want to partition the square in many small rectangles and check how many points fall in each rectangle, i.e. I want to compute the joint probability distribution of the points. I am reporting a couple of common sense approaches, using loops and not very efficient:% DataN = 1e5; % number of pointsxy = rand(N, 2); % coordinates of pointsxy(randi(2*N, 100, 1)) = 0; % add some points on one sidexy(randi(2*N, 100, 1)) = 1; % add some points on the other sidexy(randi(N, 100, 1), :) = 0; % add some points on one cornerxy(randi(N, 100, 1), :) = 1; % add some points on one cornerinds= unique(randi(N, 100, 1)); xy(inds, :) = repmat([0 1], numel(inds), 1); % add some points on one cornerinds= unique(randi(N, 100, 1)); xy(inds, :) = repmat([1 0], numel(inds), 1); % add some points on one corner% Intervals for rectanglesK1 = ceil(sqrt(N/5)); % number of intervals along xK2 = K1; % number of intervals along yint_x = [0:(1 / K1):1, 1+eps]; % intervals along xint_y = [0:(1 / K2):1, 1+eps]; % intervals along y% First approachticcount_cells = zeros(K1 + 1, K2 + 1);for k1 = 1:K1+1 inds1 = (xy(:, 1) >= int_x(k1)) & (xy(:, 1) < int_x(k1 + 1)); for k2 = 1:K2+1 inds2 = (xy(:, 2) >= int_y(k2)) & (xy(:, 2) < int_y(k2 + 1)); count_cells(k1, k2) = sum(inds1 .* inds2); endendtoc% Elapsed time is 46.090677 seconds.% Second approachticcount_again = zeros(K1 + 2, K2 + 2);for k1 = 1:K1+1 inds1 = (xy(:, 1) >= int_x(k1)); for k2 = 1:K2+1 inds2 = (xy(:, 2) >= int_y(k2)); count_again(k1, k2) = sum(inds1 .* inds2); endendcount_again_fix = diff(diff(count_again')');toc% Elapsed time is 22.903767 seconds.% Check: the two solutions are equivalentall(count_cells(:) == count_again_fix(:))How can I do it more efficiently in terms of time, memory, and possibly avoiding loops?EDIT --> I have just found this as well, it's the best solution found so far:ticcount_cells_hist = hist3(xy, 'Edges', {int_x int_y});count_cells_hist(end, :) = []; count_cells_hist(:, end) = [];tocall(count_cells(:) == count_cells_hist(:))% Elapsed time is 0.245298 seconds.but it requires the Statistics Toolbox.EDIT --> Testing solution suggested by chappjcticxcomps = single(bsxfun(@ge,xy(:,1),int_x));ycomps = single(bsxfun(@ge,xy(:,2),int_y));count_again = xcomps.' * ycomps; %' 143x143 = 143x1e5 * 1e5x143count_again_fix = diff(diff(count_again')');toc% Elapsed time is 0.737546 seconds.all(count_cells(:) == count_again_fix(:)) 解决方案 Improving on code in questionYour loops (and the nested dot product) can be eliminated with bsxfun and matrix multiplication as follows:xcomps = bsxfun(@ge,xy(:,1),int_x);ycomps = bsxfun(@ge,xy(:,2),int_y);count_again = double(xcomps).'*double(ycomps); %' 143x143 = 143x1e5 * 1e5x143count_again_fix = diff(diff(count_again')');The multiplication step accomplishes the AND and summation done in sum(inds1 .* inds2), but without looping over the density matrix. EDIT: If you use single instead of double, execution time is nearly halved, but be sure to convert your answer to double or whatever is required for the rest of the code. On my computer this takes around 0.5 sec.Note: With rot90(count_again/size(xy,1),2) you have a CDF, and in rot90(count_again_fix/size(xy,1),2) you have a PDF.Using accumarrayAnother approach is to use accumarray to make the joint histogram after we bin the data.Starting with int_x, int_y, K1, xy, etc.:% take (0,1) data onto [1 K1], following A.Dondas approach for easy comparisonii = floor(xy(:,1)*(K1-eps))+1; ii(ii<1) = 1; ii(ii>K1) = K1;jj = floor(xy(:,2)*(K1-eps))+1; jj(jj<1) = 1; jj(jj>K1) = K1;% create the histogram and normalizeH = accumarray([ii jj],ones(1,size(ii,1)));PDF = H / size(xy,1); % for probabilities summing to 1On my computer, this takes around 0.01 sec.The output is the same as A.Donda's converted from sparse to full (full(H)). Although, as he A.Donda pointed out, it is correct to have the dimensions be K1xK1, rather than the size of count_again_fix in the OPs code that was K1+1xK1+1.To get the CDF, I believe you can simply apply cumsum to each axis of the PDF. 这篇关于计算二维联合概率分布的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！