Working principle of robot vision navigation and positioning system

There are several localization technologies, but we're not concerned with those; we're only interested in vision-based ones. Vision technologies that use "eyes" can be categorized as: monocular, binocular, multi-view, and RGB-D. The latter three can add depth to images. These "eyes" can also be called VO (Visual Odometry: monocular or stereo). Wikipedia describes it as follows: In robotics and computer vision problems, visual odometry is a method of determining a robot's position and orientation by analyzing and processing relevant image sequences.

Today, due to the rapid development of digital image processing and computer vision technologies, more and more researchers are using cameras as perception sensors for fully autonomous mobile robots. This is mainly because traditional ultrasonic or infrared sensors have limited information capacity and poor robustness, while vision systems can overcome these shortcomings. The real world is three-dimensional, while the image projected onto the camera lens (CCD/CMOS) is two-dimensional. The ultimate goal of visual processing is to extract relevant information about the three-dimensional world from the perceived two-dimensional image.

Basic system components: CCD, PCI, PC and its peripherals, etc.

CCD/CMOS

A row of silicon imaging elements, with photosensitive elements and charge transfer devices arranged on a substrate, extracts video signals of multiple pixels in a time-division and sequential manner through the sequential transfer of charge. For example, the resolution of images acquired by an area array CCD sensor can range from 32×32 to 1024×1024 pixels.

Video digital signal processor

Image signals are generally two-dimensional signals. An image typically consists of 512×512 pixels (sometimes 256×256 or 1024×1024 pixels), with each pixel having 256 gray levels, or 3×8 bits, containing 16 million colors (red, yellow, blue). An image contains 256KB or 768KB (for color) of data. To complete the sensing, preprocessing, segmentation, description, recognition, and interpretation processes in visual processing, the mathematical operations primarily performed in the first few stages can be summarized as follows:

(1) Point processing is often used for contrast enhancement, density nonlinearity correction, thresholding, and pseudo-color processing. The input data of each pixel is mapped to the output data of the pixel through a certain relationship. For example, logarithmic transformation can realize contrast expansion in dark areas.

(2) Two-dimensional convolution operations are often used for image smoothing, sharpening, contour enhancement, spatial filtering, standard template matching calculation, etc. If an M×M convolution kernel matrix is used to convolve the entire image, M2 multiplications and (M2-1) additions are required to obtain the output result of each pixel. Since there are generally many pixels in an image, even if a small convolution is used, a large number of multiplication and addition operations and memory accesses are required.

(3) Two-dimensional orthogonal transformations commonly used in two-dimensional orthogonal transformations include FFT, Walsh, Haar and KL transforms, which are often used for image enhancement, restoration, two-dimensional filtering, data compression, etc.

(4) Coordinate transformation is often used for image scaling, rotation, translation, registration, geometric correction and image reconstruction from photographic values.

(5) Statistical calculations, such as calculating the density histogram distribution, mean, and covariance matrix. These statistical calculations are often required when performing histogram equalization, area calculation, classification, and KL transformation.

Working principle of visual navigation and positioning system

Simply put, it involves optical processing of the robot's surrounding environment. First, a camera is used to collect image information, which is then compressed and fed back to a learning subsystem composed of neural networks and statistical methods. The learning subsystem then links the collected image information with the robot's actual position to complete the robot's autonomous navigation and positioning function.

1) Camera calibration algorithm:

Traditional camera calibration methods mainly include the Faugeras calibration method, the Tscai two-step method, the direct linear transformation method, Zhang Zhengyou's planar calibration method, and the Weng iterative method. Self-calibration includes self-calibration methods based on Kruppa equations, hierarchical stepwise self-calibration methods, self-calibration methods based on absolute quadratic surfaces, and Pollefeys' modulus constraint method. Visual calibration methods include Ma Songde's triorthogonal translation method, Li Hua's planar orthogonal calibration method, and Hartley's rotation-based intrinsic parameter calibration method.

2) Machine Vision and Image Processing:

a. Preprocessing: Ashing, noise reduction, filtering, binarization, edge detection.

b. Feature extraction: Mapping from feature space to parameter space. Algorithms include HOUGH, SIFT, and SURF.

c. Image segmentation: RGB-HIS.

d. Image description recognition

3) Localization algorithms: Filter-based localization algorithms mainly include KF, SEIF, PF, EKF, UKF, etc.

Alternatively, a fusion of monocular vision and odometry can be used. Odometry readings are used as auxiliary information, and triangulation is employed to calculate the coordinates of feature points in the current robot coordinate system. This 3D coordinate calculation needs to be performed after a one-time-step delay. Based on the 3D coordinates of the feature points in the current camera coordinate system and their world coordinates on the map, the camera pose in the world coordinate system is estimated. This reduces sensor costs, eliminates accumulated odometry errors, and makes the localization results more accurate. Furthermore, compared to the inter-camera calibration in stereo vision, this method only requires calibration of the camera's intra-camera parameters, improving system efficiency.

Basic process of localization algorithm:

The algorithm process is simple and can be easily implemented using OpenCV.

enter

The video stream acquired by the camera (mainly grayscale images; in stereo VO, images can be either color or grayscale) is recorded as It and It+1 at times t and t+1. The camera's intrinsic parameters are obtained through camera calibration and can be calculated as fixed quantities using MATLAB or OpenCV.

Output

Calculate the camera's position and pose for each frame.

Basic process

● Get image It, It+1

● Perform distortion correction on the acquired image

● Feature detection is performed on image It using the FAST algorithm, and these features are then tracked to image It+1 using the KLT algorithm. If any tracked features are lost or the number of features falls below a certain threshold, feature detection is performed again.

● Estimate the essential matrix of two images using a 5-point algorithm with RANSAC.

● Estimate R,t by calculating the essential matrix.

● Scale information is estimated to ultimately determine the rotation matrix and translation vector.

Working principle of robot vision navigation and positioning system

Read next

CATDOLL 146CM Ya TPE (Customer Photos)

CATDOLL 136CM Mila

CATDOLL 138CM Ya Torso Doll

CATDOLL 123CM Milana TPE