


I get often confused with the meaning of the term descriptor in the context of image features. Is a descriptor the description of the local neighborhood of a point (e.g. a float vector), or is a descriptor the algorithm that outputs the description? Also, what exactly is then the output of a feature-extractor?


I have been asking myself this question for a long time, and the only explanation I came up with is that a descriptor is both, the algorithm and the description. A feature detector is used to detect distinctive points. A feature-extractor, however, does then not seem to make any sense.


So, is a feature descriptor the description or the algorithm that produces the description?



A feature detector is an algorithm which takes an image and outputs locations (i.e. pixel coordinates) of significant areas in your image. An example of this is a corner detector, which outputs the locations of corners in your image but does not tell you any other information about the features detected.

特征描述符是一种算法,它获取图像并输出特征描述符/特征向量.特征描述符将有趣的信息编码为一系列数字,并充当一种数字指纹",可用于将一个特征与另一个特征区分开.理想情况下,此信息在图像变换下将是不变的,因此即使以某种方式对图像进行变换,我们也可以再次找到该特征.一个示例是 SIFT ,该编码对有关本地邻域图像渐变的信息进行编码.特征向量.您可以阅读的其他示例是 HOG SURF .

A feature descriptor is an algorithm which takes an image and outputs feature descriptors/feature vectors. Feature descriptors encode interesting information into a series of numbers and act as a sort of numerical "fingerprint" that can be used to differentiate one feature from another. Ideally this information would be invariant under image transformation, so we can find the feature again even if the image is transformed in some way. An example would be SIFT, which encodes information about the local neighbourhood image gradients the numbers of the feature vector. Other examples you can read about are HOG and SURF.


When it comes to feature detectors, the "location" might also include a number describing the size or scale of the feature. This is because things that look like corners when "zoomed in" may not look like corners when "zoomed out", and so specifying scale information is important. So instead of just using an (x,y) pair as a location in "image space", you might have a triple (x,y,scale) as location in "scale space".


