One of the fundamental tasks of machine vision is to acquire image information from a camera and calculate the geometric information of objects in three-dimensional space, thereby reconstructing and recognizing the objects. The relationship between the three-dimensional geometric position of a point on the surface of a spatial object and its corresponding point in the image is determined by the geometric model of the camera imaging; these geometric model parameters are the camera parameters. Under most conditions, these parameters must be obtained through experimentation and calculation; this process is called camera calibration. The calibration process involves determining the geometric and optical parameters of the camera, as well as the camera's orientation relative to the world coordinate system. The accuracy of calibration directly affects the accuracy of computer vision ( machine vision) . Therefore, only after successful camera calibration can subsequent work proceed normally; improving calibration accuracy is arguably an important aspect of current scientific research.
1. Camera perspective projection model
A camera projects a 3D scene onto its 2D image plane using an imaging lens. This projection can be described by an imaging transformation (i.e., a camera imaging model). Camera imaging models are divided into linear models and nonlinear models. The pinhole camera model belongs to the linear camera model. This paper discusses the transformation relationship between a spatial point and its image projection point in various coordinate systems under this model. Figure 1 shows the relationship between three different levels of coordinate systems under the pinhole camera model. Here, (Xw, Yw, Zw) is the world coordinate system, (x, y, z) is the camera coordinate system, XfQfYf is the image coordinate system in pixels, and XOY is the image coordinate system in millimeters.
The transformation relationship between the coordinates of a point in an image in a millimeter-based image coordinate system and its coordinates in a pixel-based image coordinate system is as follows:
The transformation relationship between the coordinates of a point in space in the world coordinate system and its coordinates in the camera coordinate system is as follows:
Where is a 3×3 orthogonal unit moment; t is a three-dimensional translation vector; and M2 is a 4×4 matrix.
Because the pinhole imaging model has the following relationship:
Therefore, substituting (1) and (2) into the homogeneous coordinates and matrix representation of the above equation, we can obtain:
Where M1 represents the camera's intrinsic parameters and M2 represents the camera's extrinsic parameters. Determining a specific camera parameter is called camera calibration.
2. Calibration Classification
In general, camera calibration can be divided into two main categories: traditional camera calibration methods and camera self-calibration methods. The basic method of traditional camera calibration involves processing images of a specific calibration reference object under a given camera model, and then using a series of mathematical transformation formulas to calculate and optimize the internal and external parameters of the camera model. However, this method is difficult to implement when the scene is unknown and the camera is moving arbitrarily. In the early 1990s, Faugeras, Luong, Maybank, and others first proposed the camera self-calibration method. This self-calibration method utilizes the constraint relationships between the camera's own parameters for calibration, and is independent of the scene and camera motion, thus offering greater flexibility.
2.1 Camera Calibration Based on 3D Stereo Target
Camera calibration based on a 3D target involves placing a 3D target in front of the camera, with each vertex of a small square on the target serving as a feature point. The position of each feature point relative to the world coordinate system should be precisely determined during fabrication. After the camera acquires an image of the feature points on the target, since the equations representing the relationship between the three-dimensional spatial coordinate system and the two-dimensional image coordinate system are nonlinear equations relating the camera's intrinsic and extrinsic parameters, if we ignore the nonlinear distortion of the camera lens and treat the elements of the perspective transformation matrix as unknowns, given a set of 3D control points and their corresponding image points, then the direct linear transformation method can be used to solve for each element in the perspective transformation matrix. Therefore, the camera's intrinsic and extrinsic parameters can be calculated from the world coordinates and image coordinates of the feature points on the target.
2.2 Camera Calibration Based on 2D Planar Target
This method, also known as the Zhang Zhengyou calibration method, is a novel and flexible approach suitable for various applications. It requires a camera to capture images of a planar target from two or more different positions. Both the camera and the 2D planar target can move freely, and their internal parameters remain constant. Assuming the 2D planar target has Z=0 in the world coordinate system, the optimal solution for the camera parameters can be calculated through linear model analysis, followed by nonlinear refinement using the basic maximum likelihood method. After deriving the objective function considering lens distortion, the required camera internal and external parameters can be calculated. This calibration method is robust and does not require expensive precision calibration blocks, making it highly practical. However, the Zhang Zhengyou method introduces errors when estimating linear internal and external parameters because it assumes that straight lines on the template image remain straight after perspective projection, leading to image processing errors. Therefore, the error is relatively large when wide-angle lens distortion is significant.
2.3 Camera Calibration Based on Radial Constraints
Tsai (1986) proposed a two-step calibration method based on radial constraints. The core of this method is to first solve the overdetermined linear equations using the least squares method under the RAC (Radial Consistency) condition to obtain the camera's extrinsic parameters other than tτ (translation along the camera's optical axis). Then, other camera parameters are solved under two conditions: with and without lens distortion. Tsai's method has high accuracy and is suitable for precision measurements, but it also places high demands on the equipment and is not suitable for simple calibrations. The accuracy of this method comes at the cost of increased equipment accuracy and complexity.
3. Camera self-calibration method
A camera self-calibration method is a method that calibrates the camera without relying on a calibration reference, but only by utilizing the correspondence between images of the surrounding environment during the camera's movement. Currently available self-calibration techniques can be broadly categorized into several types, including active vision-based camera self-calibration techniques, camera self-calibration methods that directly solve the Kruppa equations, hierarchical stepwise calibration methods, and self-calibration methods based on quadratic surfaces.
3.1 Self-calibration method based on active vision
An active vision system refers to a system where a camera is fixed on a precisely controllable platform, the platform's parameters of which can be accurately read from a computer. The system only needs to control the camera to perform specific movements to acquire multiple images, and then use these images and the known camera motion parameters to determine the camera's intrinsic and extrinsic parameters. A representative method is the linear method proposed by Ma Songde, based on two sets of triorthogonal motions. Later, Yang Changjiang, Li Hua, and others proposed improved schemes, namely, linear calibration of camera parameters based on four sets and five sets of planar orthogonal motions respectively, utilizing the extreme point information in the images. This self-calibration method has a simple algorithm and can obtain linear solutions, but its drawback is that it requires a precisely controllable camera motion platform.
3.2 Self-calibration method based on Kruppa equations
The self-calibration method proposed by Faugeras, Luong, Maybank, and others is a method directly based on solving the Kruppa equation. This method derives the Kruppa equation using the concepts of absolute quadratic curve image and epipolar transformation. The Kruppa equation-based self-calibration method does not require projective reconstruction of the image sequence; instead, it establishes an equation between two images. This method is more advantageous than hierarchical stepwise calibration methods in situations where it is difficult to unify all images into a consistent projective framework. However, the trade-off is that it cannot guarantee the consistency of the infinity plane in the defined projective space of all image pairs. When the image sequence is long, the Kruppa equation-based self-calibration method may be unstable, and its robustness depends on the given initial values.
3.3 Layered stepwise calibration method
In recent years, hierarchical stepwise calibration has become a hot topic in self-calibration research and has gradually replaced the method of directly solving the Kruppa equation in practical applications. The hierarchical stepwise calibration method first requires projective reconstruction of the image sequence, then applies constraints through an absolute quadratic curve (surface), and finally determines the affine parameters (i.e., the equation of the plane at infinity) and the camera intrinsic parameters. The characteristic of the hierarchical stepwise calibration method is that, based on projective calibration, projective alignment is performed using a specific image as a reference, thereby reducing the number of unknowns, and then all unknowns are solved simultaneously through a nonlinear optimization algorithm. The drawback is that the initial values of the nonlinear optimization algorithm can only be obtained through estimation, and its convergence cannot be guaranteed. Since projective reconstruction always uses a reference image as a reference, different selections of reference images will result in different calibration results.
3.4 Self-calibration method based on quadratic surface
Triggs was the first to introduce the concept of absolute quadratic surfaces into self-calibration research. This self-calibration method is essentially the same as the method based on Kruppa equations; both utilize the invariance of absolute conic sections under Euclidean transformations. However, when multiple images are input and consistent projective reconstructions can be obtained, the self-calibration method based on quadratic surfaces is better. This is because the quadratic surface contains all the information of the plane at infinity and the absolute conic section. Furthermore, the self-calibration method based on quadratic surfaces calculates the quadratic surface based on projective reconstructions of all images. Therefore, this method guarantees the consistency of the plane at infinity across all images.
4. Conclusion
Traditional camera calibration requires a calibration reference object. To improve computational accuracy, nonlinear distortion correction parameters also need to be determined. Camera self-calibration offers greater flexibility and practicality compared to traditional methods. Through over a decade of dedicated research, the theoretical problems have been largely resolved. Current research focuses on improving the robustness of calibration algorithms and effectively applying these theories to solve real-world vision problems. To enhance robustness, it is recommended to use hierarchical stepwise self-calibration methods more frequently and to perform linear optimization on the self-calibration results.