I am currently trying to track human heads from a CCTV. I am currently using colour histogram and LBP histogram comparison to check the affinity between bounding boxes. However sometimes these are not enough.
I was reading through a paper in the following link : paper where dispersion metric is described. However I still cannot clearly get it. For example I cannot understand what pi,j is referring to in the equation. Can someone kindly & clearly explain how I can find dispersion between bounding boxes in separate frames please?
本文使用背景模型解决跟踪问题,因为大多数CCTV跟踪方法。 BG模型产生前景掩模,并且前述p_ij在一些形态之后涉及该掩模。具体地,它们基于FG掩模孔中允许的间隙的阈值,尝试将前景斑点分离成分量。此过程的最终结果是一组二进制掩码,每个假设对象一个。然后,这些掩模用于使用空间和时间一致性的跟踪。在我看来,这是一种处理视频序列的老式方式,只有在处理能力有限且场景不拥挤时才有意义。
This paper tackles the tracking problem using a background model, as most CCTV tracking methods do. The BG model produces a foreground mask, and the aforementioned p_ij relates to this mask after some morphology. Specifically, they try to separate foreground blobs into components, based on thresholds on allowed 'gaps' in FG mask holes. The end result of this procedure is a set of binary masks, one for each hypothesized object. These masks are then used for tracking using spatial and temporal consistency. In my opinion, this is an old fashioned way of processing video sequences, only relevant if you're limited in processing power and the scenes are not crowded.
To answer your question, if O is the mask related to one of the hypothesized objects, then p_ij is the binary pixel in the (i,j) location within the mask. Thus, c_x and c_y are the center of mass of the binary shape, and the dispersion is simply the average distance from the center of mass for the shape (it is larger for larger objects. This enforces scale consistency in tracking, but in a very weak manner. You can do much better if you have a calibrated camera.