The "Made in China 2025" concept was proposed against the backdrop of "Internet Plus," and one of its core components is robot intelligence. Visual technology represents the eyes and brain of machines, and machine vision will make robot intelligence a reality. Based on this, VisionDragon has pioneered the "robotic arm+" concept in the domestic machine vision field. Utilizing the "robotic arm + 3D vision" method, it has achieved 3D vision guidance technology for robotic arms, which can help the robot/robotic arm industry achieve intelligent functions, thereby meeting the requirements of "Made in China 2025."
Composition of a robot 3D vision guidance solution
The robot 3D vision guidance solution mainly consists of a 3D image acquisition solution, a 3D image processing solution, a hand-eye calibration solution, and a robot control solution.
Since robot control technology varies from brand to brand, this article will not discuss robot control technology for the time being. Instead, it will mainly introduce 3D image acquisition schemes, 3D image processing schemes, and hand-eye calibration schemes.
3D Image Acquisition Solution
3D image acquisition solutions are divided into two types: Eye-in-Hand and Eye-to-Hand.
Eye-in-Hand method: A 3D camera is mounted on a robotic arm, which is operated to drive the 3D camera to scan the object being measured along a pre-set trajectory. The object being measured must be within the field of view (FOV) and measurement range (MR) of the 3D camera.
Eye-to-Hand method: The 3D camera is mounted on a gantry near the robotic arm, and the gantry drives the 3D camera to scan the object being measured. The object being measured must be within the field of view (FOV) and measurement range (MR) of the 3D camera.
In addition to the different installation methods mentioned above, 3D acquisition solutions are also divided into passive light binocular, active light binocular, laser triangulation, structured light principle, and TOF principle acquisition methods, depending on the different 3D imaging principles.
Passive light binocular stereo vision
Passive light binocular vision consists of two area/line scan cameras and a light source. Both cameras capture images of the same location, and the height of the object can be calculated using a parallax map. Two area/line scan cameras can simultaneously capture images of the object from different angles, or a single area/line scan camera can capture images of the object from different angles at different times.
The advantage of passive-light binocular vision is that it eliminates the need for relative movement between the camera and the object, allowing for a wide field of view. The disadvantage is that when the surface contrast of the object being measured is poor, it cannot be identified, thus failing to obtain 3D information. Furthermore, passive-light binocular vision requires high recognition accuracy from each camera; otherwise, the recognition error of a single camera will be amplified in the binocular system. Due to these limitations, the use of passive-light binocular vision in industrial applications is relatively limited.
To broaden the field of view and eliminate blind spots, passive light binocular technology can also be extended to passive light multi-view technology.
Active light binocular stereo vision
To compensate for the limitations of passive binocular vision, engineers apply textured light to the object under test as an aid, such as by applying random dots.
The principle of active light binoculars is the same as that of passive light binoculars, relying on the aberrations of the two cameras. Since active light sources can better add texture to the object being measured, this enhances the versatility of this acquisition method.
Laser triangulation principle
This type of vision technology mainly includes 2D cameras, lenses, lasers, and calibration algorithms. It primarily utilizes the laser line deformation captured by the 2D camera to obtain the height information of the object being measured using trigonometric formulas.
The installation method of laser triangulation is an important influencing factor. The currently popular method is to shine the laser line directly on the object being measured, and to take pictures with the 2D camera at a certain angle (i.e., the measurement angle) to the laser.
The resolution of a 2D camera and the measurement angle at which it is installed both affect the Z-axis resolution. A higher 2D camera resolution results in a higher Z-axis resolution, but this outputs excessive useless data, which can hinder scanning speed. A larger measurement angle also results in higher Z-axis resolution, but a larger blind zone. Therefore, when building a 3D vision system, it is necessary to comprehensively consider the actual conditions of the object being measured and select an appropriate camera and installation method.
Besides the 2D camera and the measurement angle, the laser beam quality is also a major factor affecting measurement accuracy. Selecting a laser with a non-Gaussian beam and good uniformity is crucial for improving measurement accuracy.
The principle and visual technology characteristics of laser triangulation are: it can simultaneously obtain X and Z direction information, and the relative motion between the camera and the object being measured can obtain Y direction information, making it suitable for measuring close-range, small-field-of-view, high-speed, and high-precision applications.
Structured light principle
A structured light 3D camera consists of a camera and a projector. The projector projects a series of stripes of light, which are transformed according to a code. After the camera captures the stripes, it ultimately calculates the 3D information of the object. To eliminate blind spots, structured light 3D cameras are typically set up using two cameras and one projector.
Its characteristics are: the camera and the object being measured must be relatively stationary, it has high precision, but the acquisition time is relatively long.
TOF principle
A camera based on the Time-of-Flight (TOF) principle uses the time difference in the flight of light to determine the height of an object. It can be used for 3D image acquisition with a wide field of view, long distance, low precision, and low cost.
Its characteristics include: fast detection speed, large field of view, long working distance, and low price, but low accuracy and susceptibility to ambient light interference. Therefore, it is generally used indoors.
3D Image Processing Solutions
Currently, 3D vision for robots is widely used in automated welding, automated cutting, automated assembly, automated grasping, and automated palletizing. Generally, image processing is required to identify the pose of objects or the 3D coordinates of object edges. Therefore, there are two main problems to be solved by 3D image processing technology: object recognition and edge contour extraction. The following methods can be used to address these problems.
Hand-eye calibration scheme
Hand-eye calibration technology needs to solve the problem of transforming image coordinates into robot coordinates. Depending on whether it is an eye-in-hand or eye-to-hand method, the calibration method is slightly different, but ultimately it is to transform the coordinates of the object relative to the camera into the coordinates of the object relative to the tool coordinate system.
The affine relationship between the object coordinate system and the camera coordinate system is obtained from the image. Then, based on the hand-eye calibration relationship—that is, the affine relationship between the camera coordinate system and the tool coordinate system—the affine relationship between the object coordinate system and the tool coordinate system can be obtained. This allows for the extraction of pose information that the robot can recognize for guidance.
The affine relationship between the object coordinate system and the camera coordinate system is obtained from the image. Then, based on the hand-eye calibration relationship—that is, the affine relationship between the camera coordinate system and the base coordinate system, and the relationship between the base coordinate system and the tool coordinate system (the latter can be transformed internally by the robot)—the affine relationship between the object coordinate system and the tool coordinate system can be obtained. The pose information that the robot can recognize can then be extracted for guidance.