Are there best practices or well known methods for publishing/announcing (via metadata etc) what data has been loaded, verified and is currently available for reporting in a data warehouse?
我已经看到了几个用于执行此操作的内部系统 - 有些非常脆弱.
I've seen several in-house systems for doing this - some pretty fragile.
Are there some well-known concepts or good search terms I could look for?
I'm not sure exactly what you're looking for here, but what exactly are the users waiting for?
如果要让系统在经过明确定义且一致的每日 ETL 流程运行后再次可用,那么发送电子邮件、重新启用报告应用程序、更新 Intranet 站点上的状态图标等很容易.
If it's for the system to be available again after a well-defined and consistent daily ETL process runs, then it's easy to send an email, re-enable your reporting application, update a status icon on your intranet site etc.
另一方面,如果他们正在等待一个非常具体的数据集(东南亚地区小部件部门的 Q4 销售数据是否可用?")那么事情就会困难得多,因为每个人都是对不同的东西感兴趣.这甚至不是真正的技术决策,因为知道源数据何时完整和正确是一个业务问题,对于每个源系统或数据集可能有不同的答案.在我们的环境中,每日报告是完全自动化的,但每月或每年的报告不是,主要是因为经常存在不一致的事件或流程,这意味着我们仍然需要人工来确认报告可以运行.
On the other hand, if they are waiting for a very specific data set ("is the Q4 sales data for the widget division in the south-east Asia region available yet?") then things are much more difficult because everyone is interested in something different. It's not even really a technical decision because knowing when source data is complete and correct is a business question that may have a different answer for each source system or data set. In our environment, daily reports are fully automated but monthly or yearly ones are not, mostly because there are often inconsistent events or processes that mean we still need a human being to confirm that the reports can be run.
I'm sure you could use metadata to build some kind of dashboard that shows when certain data was loaded, but it would be extremely specific to your situation and your users so I don't know if there's any general solution or pattern. I imagine it would be very dependent on your business processes, reporting schema (for the metadata) and reporting tools.