We previously introduced a commonly used feature in machine vision : LBP.
LBP can effectively handle changes in illumination and is widely used in texture analysis and texture recognition.
However, LBP can only process single two-dimensional images. For videos or image sequences, how can LBP be used to extract features and capture motion information? Today, we introduce a feature called LBP-TOP, proposed by Guoying Zhao et al. from the University of Oulu, Finland. It was originally used to process dynamic texture recognition, but it is now widely used in video-based facial expression recognition.
LBP-TOP is an extension of LBP from two-dimensional to three-dimensional space. LBP-TOP stands for Local Binary Patterns from Three Orthogonal Planes. Here, "three orthogonal planes" refers to three orthogonal planes. We know that a single image only has two directions, X and Y, while a video or image sequence, in addition to the X and Y directions, also has a direction along the time axis T. These three directions (XY, XT, and YT) are mutually orthogonal. See the following diagram:
Image from reference [1]
Given an image sequence, we have texture maps in three orthogonal planes: XY represents the image we normally see, XT represents the texture scanned along the time axis for each row, and YT represents the image scanned along the time axis for each column. Simply put, we can extract LBP features from all three planes and then concatenate the LBP features from all three planes to form LBP-TOP. See the diagram below:
Image from reference [1]
Therefore, LBP-TOP is an extension of LBP from two-dimensional to three-dimensional. Compared with LBP, LBP-TOP not only considers the texture information of the XY plane, but also the texture information of XT and YT. The texture information of XT and YT records important dynamic textures.
As we know from the previously introduced LBP, extracting LBP features using uniform code encoding generates a 59-dimensional array recording the LBP histogram information. LBP-TOP features, extracting LBP from three orthogonal planes, ultimately generate a 59×3 array, tripling the feature dimension. Often, we divide images into 4×4 blocks, each generating a 59×3 array. The final feature dimension is 4×4 × 59 × 3 = 2832, therefore LBP-TOP is a high-dimensional feature.
The LBP-TOP source code can be downloaded from the University of Oulu's official website.
Statement: All images used on this blog are from reference [1] and are for learning and communication purposes only. They are strictly prohibited from being used for any commercial purposes. If you wish to reprint or cite them, please indicate reference [1] as the source.