database - 如果我有大量的坐标，如何提取与特定x值相对应的y值？

我有三个数据集编译成一个大数据集。
data1的x值范围为0-47（有序），x值附加了许多y值（一个小错误）。总共约有100000 Y值。
数据2和3相似，但x值分别为48-80和80-95。
最终目标是基于大量的y值，为每个x值（因此总共96个）生成一个标准差。因此，我认为我应该首先从这些数据集中提取每个x值的y值，然后根据标准差确定标准差。
在mathematica中，我尝试过使用select和part函数，但没有成功。

最佳答案

从统计学上讲，最好用y的预测值提供一个预测区间。
这里有一个关于这个的视频：
Intervals (for the Mean Response and a Single Response) in Simple Linear Regression
以一些示例数据说明，存储在这里作为二维码。

qrimage = Import["https://i.stack.imgur.com/s7Ul7.png"];

data = Uncompress@BarcodeRecognize@qrimage;

ListPlot[data, Frame -> True, Axes -> None]

database - 如果我有大量的坐标，如何提取与特定x值相对应的y值？-LMLPHP

设置66 & 95% confidence levels

cl = Map[Function[σ, 2 (CDF[NormalDistribution[0, 1], σ] - 0.5)], {1, 2}];

(* trying a quadratic linear fit *)
lm = LinearModelFit[data, {1, a, a^2}, a];
bands = lm["SinglePredictionBands", ConfidenceLevel -> #] & /@ cl;

(* x value for an observation outside of the sample observations *)
x0 = 50;

(* Predicted value of y *)
y0 = lm[x0]

39.8094个

(* Least-squares regression of Y on X *)
Normal[lm]

26.4425-0.00702613 A+0.0054873 A^2

(* Confidence interval for y0 given x0 *)
b1 = bands /. a -> x0;

(* R^2 goodness of fit *)
lm["RSquared"]

0.886419

b2 = {bands, {Normal[lm]}};

(* Prediction intervals plotted over the data range *)
Show[
 Plot[b2, {a, 0, 100}, PlotRange -> {{0, 100}, Automatic}, Filling -> {1 -> {2}}],
 ListPlot[data],
 ListPlot[{{x0, lm[x0]}}, PlotStyle -> Red],
 Graphics[{Red, Line[{{x0, Min[b1]}, {x0, Max[b1]}}]}],
 Frame -> True, Axes -> None]

Row[{"For x0 = ", x0, ", y0 = ", y0,
  " with 95% prediction interval ", y0, " ± ", y0 - Min[b1]}]

x0=50，y0=39.8094，95%预测区间39.8094±12.1118
满足您的需求：
最终目标是基于大量的y值，为每个x值（因此总共96个）生成一个标准差。
最好的测量方法可能是标准误差，可以通过
lm["SinglePredictionConfidenceIntervalTable"]和lm["SinglePredictionErrors"]
它们将提供“单次观测预测响应的标准误差”。如果一个x有多个y值，那么每个x值仍然只有一个标准错误。
参考：https://reference.wolfram.com/language/ref/LinearModelFit.html（详细信息和选项）