In terms of obstacle avoidance, mobile robots need to obtain real-time information about obstacles around them using sensors, including their size, shape, and position. Various types of sensors are used for obstacle avoidance, each with different principles and characteristics. Currently, the most common types include visual sensors, laser sensors, infrared sensors, and ultrasonic sensors. Below, I will briefly introduce the basic working principles of these sensors.
ultrasound
The basic principle of an ultrasonic sensor is to measure the time of flight of an ultrasonic wave, using the formula d = vt/2 to measure distance, where d is the distance, v is the speed of sound, and t is the time of flight. Since the speed of ultrasound in air is related to temperature and humidity, more accurate measurements require taking into account changes in temperature and humidity, as well as other factors.
The image above illustrates an ultrasonic sensor signal. A wave packet composed of ultrasonic pulses with a frequency of tens of kHz is generated by a piezoelectric or electrostatic transmitter. The system detects the reverse sound wave above a certain threshold, and calculates the distance using the measured time of flight. Ultrasonic sensors generally have a short effective range, typically a few meters, but with a minimum detection blind zone of tens of millimeters. Due to their low cost, simple implementation, and mature technology, ultrasonic sensors are commonly used in mobile robots. However, ultrasonic sensors also have some drawbacks, as shown in the image below.
Because sound travels in a cone, the distance we actually measure is not a point, but the distance to the nearest object within a certain cone angle.
In addition, the measurement cycle of ultrasound is relatively long. For example, it takes about 20ms for sound waves to travel such a long distance to an object about 3 meters away. Furthermore, different materials reflect or attract sound waves differently, and multiple ultrasonic sensors may interfere with each other. These are all factors that need to be considered in practical applications.
Infrared
Most infrared ranging methods use the principle of triangulation. An infrared emitter emits an infrared beam at a certain angle. When the beam encounters an object, it reflects back. By detecting the reflected light and using the geometric triangulation relationships of the structure, the distance D to the object can be calculated.
When the distance D is close enough, the value of L in the above diagram will be quite large. If it exceeds the detection range of the CCD, the sensor may not be able to see the object even though it is very close. When the distance D is large, the value of L will be very small, and the measurement accuracy will deteriorate. Therefore, common infrared sensors have relatively short measurement distances, less than ultrasonic sensors, and there are also minimum distance limitations for long-distance measurements. In addition, infrared sensors cannot detect the distance to transparent or nearly blackbody objects. However, compared to ultrasonic sensors, infrared sensors have a higher bandwidth.
laser
Common lidar systems are time-of-flight (ToF) based, measuring distance by the time the laser beam travels (d=ct/2), similar to the ultrasonic ranging formula mentioned earlier. Here, d is the distance, c is the speed of light, and t is the time interval between transmission and reception. A lidar system consists of a transmitter and a receiver. The transmitter illuminates the target with a laser beam, and the receiver receives the reflected light. Mechanical lidar systems include a mechanism with a mirror. The rotation of the mirror allows the laser beam to cover a plane, enabling the measurement of distance information on that plane.
There are different methods for measuring time of flight. For example, pulsed lasers can be used, and then, similar to the ultrasonic method mentioned earlier, the time taken can be measured directly. However, because the speed of light is much higher than the speed of sound, very high-precision time measurement elements are required, which makes it very expensive. Another method is to emit a frequency-modulated continuous laser wave and measure the time by measuring the difference frequency between the received reflected waves.
A relatively simple approach is to measure the phase shift of the reflected light. The sensor emits modulated light of a certain amplitude at a known frequency and measures the phase shift between the emitted and reflected signals, as shown in Figure 1 above. The wavelength of the modulated signal is lamda = c/f, where c is the speed of light and f is the modulation frequency. After measuring the phase shift difference theta between the emitted and reflected beams, the distance can be calculated using lamda * theta / 4pi, as shown in Figure 2 above.
LiDAR can measure distances of tens or even hundreds of meters, with high angular resolution, typically reaching a few tenths of a degree, and high ranging accuracy. However, the confidence level of the measured distance is inversely proportional to the square of the received signal amplitude. Therefore, distance measurements of black bodies or distant objects are not as accurate as those of bright, nearby objects. Furthermore, lidar is ineffective for transparent materials such as glass. Additionally, due to its complex structure and high component costs, lidar is also very expensive.
Some low-end lidar systems use triangulation for ranging. However, this limits their range, typically to a few meters, and their accuracy is relatively low. But they are still quite effective for SLAM in low-speed indoor environments or for obstacle avoidance in outdoor environments.
Visual
There are many commonly used computer vision solutions, such as binocular vision, Time-of-Flight (TOF) based depth cameras, and structured light based depth cameras. Depth cameras can simultaneously acquire RGB images and depth maps. However, whether based on TOF or structured light, their performance is not ideal in bright outdoor lighting conditions because they both require active light emission.
Like structured light-based depth cameras, the emitted light generates relatively random yet fixed speckle patterns. These specks, when they hit an object, are captured at different positions by the camera due to varying distances. The offset of the captured specks from a calibrated standard pattern at different positions is then calculated. Using parameters such as camera position and sensor size, the distance between the object and the camera can be determined. However, our current E-Patrol robot primarily operates in outdoor environments, where active light sources are significantly affected by sunlight and other conditions. Therefore, a passive vision solution based on binocular vision is more suitable. Thus, our vision solution is based on binocular vision.
Binocular vision ranging is essentially a triangulation method. Because the two cameras are positioned differently, much like our two eyes, they see objects differently. The same point P seen by both cameras will have different pixel positions during imaging. Triangulation can then be used to measure the distance to this point. Unlike structured light methods, where the points calculated are actively emitted and known, binocular algorithms typically calculate points using image features captured by the algorithm, such as SIFT or SURF features. This results in a sparse image.
For effective obstacle avoidance, sparse maps are insufficient; we need dense point clouds containing depth information for the entire scene. Dense matching algorithms can be broadly categorized into two types: local algorithms and global algorithms. Local algorithms use local pixel information to calculate depth, while global algorithms utilize all information within the image. Generally, local algorithms are faster, but global algorithms offer higher accuracy.
Both of these categories have many different specific algorithm implementations. From their output, we can estimate the depth information of the entire scene. This depth information helps us find walkable areas and obstacles in the map scene. The overall output is similar to a 3D point cloud map output by LiDAR, but it provides much richer information. Compared to LiDAR, visual detection has the advantage of being much cheaper, but its disadvantages are also obvious: lower measurement accuracy and much higher computational requirements. Of course, this difference in accuracy is relative and is perfectly adequate for practical applications. Furthermore, our current algorithm can run in real-time on our NVIDIA TK1 and TX1 platforms.
In practical applications, what we read from the camera is a continuous stream of video frames. We can also use these frames to estimate the motion of target objects in the scene, build motion models for them, and estimate and predict their motion direction and speed. This is very useful for our actual walking and obstacle avoidance planning.
The above are some of the most common types of sensors, each with its own advantages and disadvantages. In practical applications, a combination of different sensors is generally used to maximize the robot's ability to correctly perceive obstacle information under various application and environmental conditions. Our company's E-Patrol robot's obstacle avoidance solution primarily uses binocular vision, supplemented by multiple other sensors, to ensure that obstacles within a 360-degree spatial range around the robot can be effectively detected, guaranteeing the robot's walking safety.