问题是:
给定n个点(在2d中)的x和y坐标,找到p点(在n中
给定点),使得从其他(n-1)点到
p是最小值。
这一点通常被称为Geometric Median。有没有什么有效的算法来解决这个问题,除了天真的算法?

最佳答案

我曾经用simulated annealing为一个本地在线法官解决过类似的问题。这也是官方的解决方案,该计划得到了AC。
唯一的区别是我必须找到的点不必是N给定点的一部分。
这是我的C++代码,N可以和50000一样大。该程序在0.1s中的2GHz Pentium 4上执行。

// header files for IO functions and math
#include <cstdio>
#include <cmath>

// the maximul value n can take
const int maxn = 50001;

// given a point (x, y) on a grid, we can find its left/right/up/down neighbors
// by using these constants: (x + dx[0], y + dy[0]) = upper neighbor etc.
const int dx[] = {-1, 0, 1, 0};
const int dy[] = {0, 1, 0, -1};

// controls the precision - this should give you an answer accurate to 3 decimals
const double eps = 0.001;

// input and output files
FILE *in = fopen("adapost2.in","r"), *out = fopen("adapost2.out","w");

// stores a point in 2d space
struct punct
{
    double x, y;
};

// how many points are in the input file
int n;

// stores the points in the input file
punct a[maxn];

// stores the answer to the question
double x, y;

// finds the sum of (euclidean) distances from each input point to (x, y)
double dist(double x, double y)
{
    double ret = 0;

    for ( int i = 1; i <= n; ++i )
    {
        double dx = a[i].x - x;
        double dy = a[i].y - y;

        ret += sqrt(dx*dx + dy*dy); // classical distance formula
    }

    return ret;
}

// reads the input
void read()
{
    fscanf(in, "%d", &n); // read n from the first

    // read n points next, one on each line
    for ( int i = 1; i <= n; ++i )
        fscanf(in, "%lf %lf", &a[i].x, &a[i].y), // reads a point
        x += a[i].x,
        y += a[i].y; // we add the x and y at first, because we will start by approximating the answer as the center of gravity

    // divide by the number of points (n) to get the center of gravity
    x /= n;
    y /= n;
}

// implements the solving algorithm
void go()
{
    // start by finding the sum of distances to the center of gravity
    double d = dist(x, y);

    // our step value, chosen by experimentation
    double step = 100.0;

    // done is used to keep track of updates: if none of the neighbors of the current
    // point that are *step* steps away improve the solution, then *step* is too big
    // and we need to look closer to the current point, so we must half *step*.
    int done = 0;

    // while we still need a more precise answer
    while ( step > eps )
    {
        done = 0;
        for ( int i = 0; i < 4; ++i )
        {
            // check the neighbors in all 4 directions.
            double nx = (double)x + step*dx[i];
            double ny = (double)y + step*dy[i];

            // find the sum of distances to each neighbor
            double t = dist(nx, ny);

            // if a neighbor offers a better sum of distances
            if ( t < d )
            {
                update the current minimum
                d = t;
                x = nx;
                y = ny;

                // an improvement has been made, so
                // don't half step in the next iteration, because we might need
                // to jump the same amount again
                done = 1;
                break;
            }
        }

        // half the step size, because no update has been made, so we might have
        // jumped too much, and now we need to head back some.
        if ( !done )
            step /= 2;
    }
}

int main()
{
    read();
    go();

    // print the answer with 4 decimal points
    fprintf(out, "%.4lf %.4lf\n", x, y);

    return 0;
}

然后我认为从列表中选择一个最接近此算法返回的(x, y)是正确的。
这个算法利用了维基百科中关于几何中值的一段话:
然而,很容易计算出一个近似值。
使用迭代过程的几何中值,其中每个步骤
产生更精确的近似值。这种类型的过程可以是
由到采样点的距离之和
是一个凸函数,因为到每个采样点的距离是
凸函数与凸函数之和保持凸性。因此,
减少每一步距离之和的过程不能得到
陷入局部最优。
这种类型的一种常见方法称为
weiszfeld的算法在endre-weiszfeld的工作之后,[4]是一种形式
迭代加权最小二乘法。这个算法定义了一个集合
重量与距离成反比
当前对样本的估计,并创建一个新的估计
根据这些权重的样本加权平均值。那个
是,
上面的第一段解释了为什么这样做:因为我们试图优化的函数没有任何局部最小值,所以可以通过迭代改进来贪婪地找到最小值。
把这看作是一种二进制搜索。首先,你对结果进行近似。一个好的近似将是重心,我的代码在读取输入时计算。然后,看看相邻的点是否能给你一个更好的解决方案。在这种情况下,如果某个点与当前点的距离小于cc>,则该点被视为相邻的。如果更好,那么放弃当前的点是可以的,因为正如我所说的,这不会使您陷入局部最小值,因为您试图最小化的函数的性质。
在此之后,你一半的步长,就像二进制搜索一样,一直持续到你认为是一个足够好的近似值(由step常量控制)。
因此算法的复杂性取决于你希望结果的准确性。

09-25 21:24