问题描述
我想出了一种针对自己的问题的自定义插值方法,我想问一问使用它是否有任何风险.我不是数学或编程专家,这就是为什么我需要反馈:)
故事:
当我想出一种对数据进行插值的想法时,我正在寻找一种很好的曲线拟合方法.
我将涂料混合在一起,待膜干燥时用分光光度计进行反射率测量.我想计算出达到一定亮度所需的白色和彩色油漆的比例,而不考虑任何色移(例如黑色+白色油漆产生偏蓝的灰色)或色度损失(例如橙色+白色给出淡淡"的淡橙色),等)
我检查是否适用Beer-Lambert法,但不适用.颜料混合的行为比染料稀释的行为更为复杂.因此,我想对数据点拟合曲线(此处解释了此过程:
第二步,我想从该数据中猜测一个函数,该函数将为 0
和 1
之间的任何比率计算光谱曲线.我确实测试了几种曲线拟合(拟合指数函数)和插值(二次,三次)方法,但是结果质量很差.
例如,这是我在所有颜色样本的 380nm
处的反射率数据:
这是使用功能的 scipy.optimize.curve_fit
的结果:
def func(x,a,b,c):返回一个* np.exp(-b * x)+ cpopt,pcov = curve_fit(func,x,y)
然后我想到了这个想法:光谱数据的对数与一条直线更接近,而数据对数的对数几乎是一条直线,如以下代码和图形所示:
将numpy导入为np导入matplotlib.pyplot作为pltReflectionance_at_380nm = 5.319,13.3875,24.866,35.958,47.1105,56.2255,65.232,83.9295比率= 1,1/2.,1/4.,1/8.,1/16.,1/32.,1/64.,0linear_approx = np.log(np.log(reflectance_at_380nm))plt.plot(比率,linear_approx)plt.show()
然后我要做的是对线性逼近进行插值,然后将数据转换回线性,然后我对数据进行了很好的插值,比以前得到的要好得多:
将numpy导入为np导入matplotlib.pyplot作为plt导入scipy.interpolateReflectionance_at_380nm = 5.319,13.3875,24.866,35.958,47.1105,56.2255,65.232,83.9295比率= 1,1/2.,1/4.,1/8.,1/16.,1/32.,1/64.,0linear_approx = np.log(np.log(reflectance_at_380nm))xnew = np.arange(100)/100.cs = scipy.interpolate.spline(比率,linear_approx,xnew,order = 1)cs = np.exp(np.exp(cs))plt.plot(xnew,cs)plt.plot(x,y,'ro')plt.show()
所以我的问题是给专家们:这种插值方法有多好?使用该方法有什么风险?会导致错误的结果吗?
也:该方法是否可以改进或已经存在?如果可以,则如何调用?
感谢您阅读
这类似于内核方法,用于拟合回归线或查找分类问题的决策边界.
内核技巧背后的想法是,将数据转换为一个维度空间(通常为更高维度),在该空间中数据是线性可分离的(用于分类),或者具有线性曲线拟合(用于回归).曲线拟合完成后,可以应用逆变换.在您的情况下,连续指数(exp(exp(X)))似乎是逆变换,而连续对数(log(log(x)))似乎是变换.
我不确定是否有一个内核可以完全做到这一点,但是直觉是相似的.这是一篇中型文章,解释了如何使用SVM进行分类: https://medium.com/@ zxr.nju/what-is-the-kernel-trick-why-is-it-important-98a98db0961d
由于它是机器学习中非常普遍使用的一种方法,因此我怀疑如果正确完成拟合(而不是过拟合或过拟合)会导致错误的结果-这需要通过统计来判断测试.
I came up with a custom interpolation method for my problem and I'd like to ask if there are any risks using it. I am not a math or programming expert, that's why I'd like a feedback :)
Story:
I was searching for a good curve-fit method for my data when I came up with an idea to interpolate the data.
I am mixing paints together and making reflectance measurements with a spectrophotometer when the film is dry. I would like to calculate the required proportions of white and colored paints to reach a certain lightness, regardless of any hue shift (e.g. black+white paints gives a bluish grey) or chroma loss (e.g. orange+white gives "pastel" yellowish orange, etc.)
I check if Beer-Lambert law applies, but it does not. Pigment-mixing behaves in a more complicated fashion than dye-dilutions. So I wanted to fit a curve to my data points (the process is explained here: Interpolation for color-mixing
First step was doing a calibration curve, I tested the following ratios of colored VS white paints mixed together:
ratios = 1, 1/2., 1/4., 1/8., 1/16., 1/32., 1/64., 0
This is the plot of my carefully prepared samples, measured with a spectrophotometer, the blue curve represents the full color (ratio = 1), the red curve represents the white paint (ratio = 0), the black curves the mixed samples:
Second step I wanted to guess from this data a function that would compute a spectral curve for any ration between 0
and 1
. I did test several curve fitting (fitting an exponential function) and interpolation (quadratic, cubic) methods but the results were of a poor quality.
For example, this is my reflectance data at 380nm
for all the color samples:
This is the result of scipy.optimize.curve_fit
using the function:
def func(x, a, b, c):
return a * np.exp(-b * x) + c
popt, pcov = curve_fit(func, x, y)
Then I came-up with this idea: the logarithm of the spectral data gives a closer match to a straight line, and the logarithm of the logarithm of the data is almost a straight line, as demonstrated by this code and graph:
import numpy as np
import matplotlib.pyplot as plt
reflectance_at_380nm = 5.319, 13.3875, 24.866, 35.958, 47.1105, 56.2255, 65.232, 83.9295
ratios = 1, 1/2., 1/4., 1/8., 1/16., 1/32., 1/64., 0
linear_approx = np.log(np.log(reflectance_at_380nm))
plt.plot(ratios, linear_approx)
plt.show()
What I did then is to interpolate the linear approximation an then convert the data back to linear, then I got a very nice interpolation of my data, much better than what I got before:
import numpy as np
import matplotlib.pyplot as plt
import scipy.interpolate
reflectance_at_380nm = 5.319, 13.3875, 24.866, 35.958, 47.1105, 56.2255, 65.232, 83.9295
ratios = 1, 1/2., 1/4., 1/8., 1/16., 1/32., 1/64., 0
linear_approx = np.log(np.log(reflectance_at_380nm))
xnew = np.arange(100)/100.
cs = scipy.interpolate.spline(ratios, linear_approx, xnew, order=1)
cs = np.exp(np.exp(cs))
plt.plot(xnew,cs)
plt.plot(x,y,'ro')
plt.show()
So my question is for experts: how good is this interpolation method and what are the risks of using it? Can it lead to wrong results?
Also: can this method be improved or does it already exists and if so how is it called?
Thank you for reading
This looks similar to the Kernel Method that is used for fitting regression lines or finding decision boundaries for classification problems.
The idea behind the Kernel trick being, the data is transformed into a dimensional space (often higher dimensional), where the data is linearly separable (for classification), or has a linear curve-fit (for regression). After the curve-fitting is done, inverse transformations can be applied. In your case successive exponentiations (exp(exp(X))), seems to be the inverse transformation and successive logarithms (log(log(x)))seems to be the transformation.
I am not sure if there is a kernel that does exactly this, but the intuition is similar. Here is a medium article explaining this for classification using SVM:https://medium.com/@zxr.nju/what-is-the-kernel-trick-why-is-it-important-98a98db0961d
Since it is a method that is quite popularly used in Machine Learning, I doubt it will lead to wrong results if the fit is done properly (not under-fit or over-fit) - and this needs to be judged by statistical testing.
这篇关于这种插值方法有多好?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!