我有一个DataPoint的列表,例如

List<DataPoint> newpoints=new List<DataPoint>();


其中DataPoint是一个类,由A到I的九个双重特征组成,并且

newpoints.count=100000 double points (i.e each point consists of nine double features from A to I)


我需要使用Min-Max标准化方法以及0到1之间的scale_range来应用List newpoints的标准化。

到目前为止,我已经完成了以下步骤


每个DataPoints功能都分配给一维数组。例如,功能A的代码

for (int i = 0; i < newpoints.Count; i++)
    {  array_A[i] = newpoints[i].A;} and so on for all nine double features

我已应用最大-最小规范化方法。例如,功能A的代码:

normilized_featureA= (((array_A[i] - array_A.Min()) * (1 - 0)) /
                  (array_A.Max() - array_A.Min()))+0;



该方法已成功完成,但需要更多时间(即3分45秒)

如何在C#中使用LINQ代码应用Max_min规范化,以将我的时间减少到几秒钟?
我在Stackoverflow How to normalize a list of int values中发现了这个问题,但是我的问题是

double valueMax = list.Max(); // I need Max point for feature A  for all 100000
double valueMin = list.Min(); //I need Min point for feature A  for all 100000


对所有其他九个功能等等
非常感谢您的帮助。

最佳答案

作为在“ DataPoint”类上将9个要素建模为double属性的一种替代方法,您还可以将9 double的数据点建模为一个数组,其好处是您可以使用LINQ一次完成所有9个计算:

var newpoints = new List<double[]>
{
    new []{1.23, 2.34, 3.45, 4.56, 5.67, 6.78, 7.89, 8.90, 9.12},
    new []{2.34, 3.45, 4.56, 5.67, 6.78, 7.89, 8.90, 9.12, 12.23},
    new []{3.45, 4.56, 5.67, 6.78, 7.89, 8.90, 9.12, 12.23, 13.34},
    new []{4.56, 5.67, 6.78, 7.89, 8.90, 9.12, 12.23, 13.34, 15.32}
};

var featureStats = newpoints
// We make the assumption that all 9 data points are present on each row.
.First()
// 2 Anon Projections - first to determine min / max as a function of column
.Select((np, idx) => new
{
   Idx = idx,
   Max = newpoints.Max(x => x[idx]),
   Min = newpoints.Min(x => x[idx])
})
// Second to add in the dynamic Range
.Select(x => new {
  x.Idx,
  x.Max,
  x.Min,
  Range = x.Max - x.Min
})
// Back to array for O(1) lookups.
.ToArray();

// Do the normalizaton for the columns, for each row.
var normalizedFeatures = newpoints
   .Select(np => np.Select(
      (i, idx) => (i - featureStats[idx].Min) / featureStats[idx].Range));

foreach(var datapoint in normalizedFeatures)
{
  Console.WriteLine(string.Join(",", datapoint.Select(x => x.ToString("0.00"))));
}


结果:

0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
0.33,0.33,0.33,0.33,0.34,0.47,0.23,0.05,0.50
0.67,0.67,0.67,0.67,0.69,0.91,0.28,0.75,0.68
1.00,1.00,1.00,1.00,1.00,1.00,1.00,1.00,1.00

关于c# - 最小-最大数据点归一化,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/27666412/

10-08 21:44