




I trained data from 500 devices to predict their performance. Then I applied my trained model to a test data set for another 500 devices and show pretty good prediction results. Now my executives want me to prove this model will work well on one million devices not only on 500. Obviously we don't have data for one million devices. And if the model is not reliable, they want me to discover the required amount of train data in order to make a reliable prediction on one million devices. How should I deal with these executives who don't have a background in statistical analysis and modeling? Any suggestions? Thanks



I have suggested to @cep to write up his comment as an answer - including providing the variance and bias calculations. In any case it could be added


While there may be Dilbert managers out there .. somewhere I have seen few of them myself. More often managers get to their positions through hard work. They are likely to be rusty - but the abilities are likely still there.


In this case whether or not they have a "background in statistical analysis and modeling" they are applying common sense.

您可能要做的第一件事是提供适当的上下文和术语. @cel提到了其中的一些:为:提供具体值:

The first thing you might do is to provide the proper context and terminology. @cel has mentioned some of it: providing concrete values for :

  • 假设
    • 您对数据有什么假设.
    • 考虑有限数据的推断依据是什么
    • 为什么应该相信外推结果可应用于99.5%的未经测试的数据
    • assumptions
      • what assumptions are you making about the data.
      • What basis is there to consider extrapolation of the limited data
      • why should said extrapoated results be trusted to apply to the 99.5% of untested data
      • 基本描述统计
      • 您先验数据分布.证明选择原因的理由
      • 考虑了哪些模型/方法以及原因
      • 您实际选择的型号以及原因
      • 您是如何得出超参数的?
      • 您如何训练模型
      • 拟合度和错误率的统计量


09-05 05:44