本文介绍了两个程序对象的运行时比较方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在通过一种特殊类型的代码测试来实现,这是非常有趣的,可以自动化,但我不确定最佳做法。在描述问题之前,我想明确地说,我正在寻找适当的术语和概念,以便我可以阅读更多关于如何实现它。欢迎有关最佳做法的建议,但我的目标是具体的:这种方法叫什么?

I am working through a particular type of code testing that is rather nettlesome and could be automated, yet I'm not sure of the best practices. Before describing the problem, I want to make clear that I'm looking for the appropriate terminology and concepts, so that I can read more about how to implement it. Suggestions on best practices are welcome, certainly, but my goal is specific: what is this kind of approach called?

在最简单的情况下,我有两个程序一堆数据,产生各种中间对象,然后返回最终结果。当端对端测试时,最终结果不同,因此需要找出差异发生的位置。不幸的是,甚至中间结果可能不同,但并不总是以显着的方式(即,一些差异是可容忍的)。最后的皱纹是中间对象在两个程序之间可能不一定具有相同的名称,并且两组中间对象可能不完全重叠(例如,一个程序可能具有比另一个程序更多的中间对象)。因此,我不能假设在两个程序中创建的对象之间存在一对一的关系。

In the simplest case, I have two programs that take in a bunch of data, produce a variety of intermediate objects, and then return a final result. When tested end-to-end, the final results differ, hence the need to find out where the differences occur. Unfortunately, even intermediate results may differ, but not always in a significant way (i.e. some discrepancies are tolerable). The final wrinkle is that intermediate objects may not necessarily have the same names between the two programs, and the two sets of intermediate objects may not fully overlap (e.g. one program may have more intermediate objects than the other). Thus, I can't assume there is a one-to-one relationship between the objects created in the two programs.

我正在考虑采用的方法自动化对象的比较如下(它受到文本语料库中的频率计数的大致启发):

The approach that I'm thinking of taking to automate this comparison of objects is as follows (it's roughly inspired by frequency counts in text corpora):


  1. 对于每个程序,A和B:创建一个在执行过程中创建的对象的列表,它可以以非常简单的方式进行索引,例如a001,a002,a003,a004,...以及类似的B(b001,...)。

  2. 让Na = A中遇到的唯一对象名称#,类似于B中的Nb和#对象。

  3. 创建两个表,TableA和TableB,其中Na和Nb柱。条目将在每个触发器记录每个对象的值(即对于每一行,下一个定义)。

  4. 对于A中的每个赋值,最简单的方法是捕获所有Na项目;当然,可以对那些不改变的项目使用LOCF(最后一次观察结果),并且任何尚未被看到的对象都被简单地赋予一个NULL条目。对于B重复此操作。

  5. 通过哈希值匹配TableA和TableB中的条目。理想情况下,对象将以大致相同的顺序到达词汇表,以便顺序和哈希值可以让人们识别出值序列。

  6. 查找A中的对象之间的差异和B基于什么时候的哈希值的序列分歧为具有不同序列的任何对象。

  1. For each program, A and B: create a list of the objects created throughout execution, which may be indexed in a very simple manner, such as a001, a002, a003, a004, ... and similarly for B (b001, ...).
  2. Let Na = # of unique object names encountered in A, similarly for Nb and # of objects in B.
  3. Create two tables, TableA and TableB, with Na and Nb columns, respectively. Entries will record a value for each object at each trigger (i.e. for each row, defined next).
  4. For each assignment in A, the simplest approach is to capture the hash value of all of the Na items; of course, one can use LOCF (last observation carried forward) for those items that don't change, and any as-yet unobserved objects are simply given a NULL entry. Repeat this for B.
  5. Match entries in TableA and TableB via their hash values. Ideally, objects will arrive into the "vocabulary" in approximately the same order, so that order and hash value will allow one to identify the sequences of values.
  6. Find discrepancies in the objects between A and B based on when the sequences of hash values diverge for any objects with divergent sequences.

现在,这是一个简单的方法,可以如果数据是简单的,原子的,并且不易受数值精度问题的影响,这些工作是非常好的。然而,我相信数值精度可能会导致哈希值发散,但如果差异大致在机器容差水平,则影响是微不足道的。

Now, this is a simple approach and could work wonderfully if the data were simple, atomic, and not susceptible to numerical precision issues. However, I believe that numerical precision may cause hash values to diverge, though the impact is insignificant if the discrepancies are approximately at the machine tolerance level.

第一:什么是这种类型的测试方法和概念的名称?答案不一定是上述方法,但反映了用于比较两个(或多个)不同程序的对象的方法类。

First: What is a name for such types of testing methods and concepts? An answer need not necessarily be the method above, but reflects the class of methods for comparing objects from two (or more) different programs.

第二:存在什么标准方法对于我在步骤3和4中描述的内容?例如,值不仅仅是一个散列:一个也可能存储对象的大小 - 毕竟,两个对象在大小上有很大的不同是不一样的。

Second: What are standard methods exist for what I describe in steps 3 and 4? For instance, the "value" need not only be a hash: one might also store the sizes of the objects - after all, two objects cannot be the same if they are massively different in size.

在实践中,我倾向于比较少量的项目,但我怀疑当自动化时,这不需要涉及用户的大量投入。

In practice, I tend to compare a small number of items, but I suspect that when automated this need not involve a lot of input from the user.

编辑1:与比较执行轨迹有关;它提到代码比较,这与我的兴趣有关,虽然我关心的是数据(即对象)而不是生成对象的实际代码。我刚刚撇去它,但会更仔细地审查它的方法。更重要的是,这表明比较代码跟踪可以扩展到比较数据跟踪。 分析了代码跟踪的一些比较,尽管在完全无关的安全测试领域。

Edit 1: This paper is related in terms of comparing the execution traces; it mentions "code comparison", which is related to my interest, though I'm concerned with the data (i.e. objects) than with the actual code that produces the objects. I've just skimmed it, but will review it more carefully for methodology. More importantly, this suggests that comparing code traces may be extended to comparing data traces. This paper analyzes some comparisons of code traces, albeit in a wholly unrelated area of security testing.

也许数据跟踪和堆栈跟踪方法是相关的。检查点稍微相关,但其典型用途(即保存所有状态)是过度的。

Perhaps data-tracing and stack-trace methods are related. Checkpointing is slightly related, but its typical use (i.e. saving all of the state) is overkill.

编辑2:其他相关概念包括和监视远程系统(例如空间探测器),其中一个尝试使用本地实现,通常是一个克隆(想到一个HAL-9000与其地球克隆相比)。我已经看不出单元测试,逆向工程,各种取证的路线,以及什么。在开发阶段,可以确保单元测试的一致性,但这对于仪器分析似乎并不有用。对于逆向工程,目标可以是代码&数据协议,但评估重新编码的保真度的方法看起来并不容易。每个程序的取证非常容易找到,但程序之间的比较似乎并不常见。

Edit 2: Other related concepts include differential program analysis and monitoring of remote systems (e.g. space probes) where one attempts to reproduce the calculations using a local implementation, usually a clone (think of a HAL-9000 compared to its earth-bound clones). I've looked down the routes of unit testing, reverse engineering, various kinds of forensics, and whatnot. In the development phase, one could ensure agreement with unit tests, but this doesn't seem to be useful for instrumented analyses. For reverse engineering, the goal can be code & data agreement, but methods for assessing fidelity of re-engineered code don't seem particularly easy to find. Forensics on a per-program basis are very easily found, but comparisons between programs don't seem to be that common.

推荐答案

(使这个答案社区wiki,因为数据流编程和反应性编程不是我的专业领域。)

(Making this answer community wiki, because dataflow programming and reactive programming are not my areas of expertise.)

数据流编程的区域似乎是相关的,因此调试的数据流程序可能是有帮助的。 提供了几个有用的高级别想法虽然很难将这些转换为立即适用的代码,但它确实提出了一种我忽略的方法:当作为数据流接近程序时,可以静态或动态地识别输入值的变化导致中间处理中其他值的变化或者在输出中(不仅仅是执行中的改变,如果要检查控制流)。

The area of data flow programming appears to be related, and thus debugging of data flow programs may be helpful. This paper from 1981 gives several useful high level ideas. Although it's hard to translate these to immediately applicable code, it does suggest a method I'd overlooked: when approaching a program as a dataflow, one can either statically or dynamically identify where changes in input values cause changes in other values in the intermediate processing or in the output (not just changes in execution, if one were to examine control flow).

尽管数据流编程通常与并行或分布式计算相关,与紧密相连,这是可以实现对象(如散列)的监视。

Although dataflow programming is often related to parallel or distributed computing, it seems to dovetail with Reactive Programming, which is how the monitoring of objects (e.g. the hashing) can be implemented.

这个答案是远远不够的,因此CW标签,因为它并没有真正命名我所描述的调试方法。也许这是一种反应式编程范例的调试方式。

This answer is far from adequate, hence the CW tag, as it doesn't really name the debugging method that I described. Perhaps this is a form of debugging for the reactive programming paradigm.

[另请注意:虽然这个答案是CW,但如果任何人对数据流有一个更好的答案,反应式编程,请随意张贴单独的答案,我将删除这个。]

[Also note: although this answer is CW, if anyone has a far better answer in relation to dataflow or reactive programming, please feel free to post a separate answer and I will remove this one.]

注1:Henrik Nilsson和Peter Fritzson 用于懒惰功能语言的调试,这有点相关:调试目标是评估值,而不是执行代码。 似乎有几个好的想法,他们的工作部分启发了本文一个叫做Lustre的反应式编程语言的调试器。这些参考文献不回答原来的问题,但是面对同样挑战的任何人都可能感兴趣,尽管在不同的编程环境中。

Note 1: Henrik Nilsson and Peter Fritzson have a number of papers on debugging for lazy functional languages, which are somewhat related: the debugging goal is to assess values, not the execution of code. This paper seems to have several good ideas, and their work partially inspired this paper on a debugger for a reactive programming language called Lustre. These references don't answer the original question, but may be of interest to anyone facing this same challenge, albeit in a different programming context.

这篇关于两个程序对象的运行时比较方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!