How do autonomous vehicles correctly identify roads?

Sensor hardware: The "tentacles" for sensing the world

Sensor hardware is the cornerstone of autonomous driving road recognition. Current mainstream sensors include cameras, LiDAR, millimeter-wave radar, inertial measurement units (IMUs), and high-precision global navigation satellite systems (GNSS). Cameras, like human eyes, acquire high-resolution optical images and, with advanced image processing technology, can identify lane lines, traffic signs, and traffic lights. Different types of cameras, such as forward-looking, rear-looking, and surround-view cameras, collectively provide vehicles with a 360-degree field of view, enabling them to fully perceive their surroundings. LiDAR, by emitting laser beams and measuring the time delay of reflected light, acquires three-dimensional point cloud data of the environment, accurately depicting the geometry of surrounding objects and road surfaces, clearly "seeing" road outlines and obstacles even at night or in low-light conditions. Millimeter-wave radar uses electromagnetic waves in the millimeter-wave band to detect target objects. Its advantage lies in its ability to stably detect obstacles and road edges even in adverse weather conditions such as heavy rain, fog, and dust storms, and it can measure the distance, speed, and angle of target objects in real time. The IMU is responsible for measuring the vehicle's acceleration and angular velocity, providing the vehicle with accurate attitude information, while GNSS determines the vehicle's absolute position.

However, no sensor is perfect. Cameras are prone to glare in strong light, leading to image distortion and affecting recognition performance; LiDAR suffers significant degradation in point cloud quality and accuracy in heavy rain or fog; and millimeter-wave radar has relatively low angular resolution, limiting its ability to perceive object details. To overcome these limitations and achieve comprehensive and accurate perception of the road environment, autonomous driving systems commonly employ multi-sensor fusion technology. This technology precisely aligns and fuses data collected by different types of sensors in both time and space. First, spatial calibration of the sensors is required, including calibrating the intrinsic and extrinsic parameters of cameras, the extrinsic parameters between LiDAR and the vehicle coordinate system, and the alignment relationship between millimeter-wave radar and other sensors. Only after precise calibration can the data from each sensor be seamlessly stitched together in the same coordinate system, providing a reliable data foundation for subsequent perception algorithms. Through multi-sensor fusion, autonomous vehicles can fully leverage the strengths of each sensor and compensate for their weaknesses, thereby constructing a more accurate and comprehensive model of the road environment.

Perception algorithms: The "intelligent brain" for understanding the world

Perception algorithms are the core of road recognition, giving autonomous vehicles the ability to "understand" the road information they perceive. Taking cameras as an example, common road recognition subtasks involved include lane line detection, semantic segmentation and instance segmentation, and traffic sign and traffic light recognition.

Lane detection is crucial for ensuring vehicles stay in the correct lane. Before lane detection, the images captured by the camera need preprocessing, including distortion removal, color space conversion, and edge detection to highlight lane line features. Then, deep learning-based convolutional neural networks (CNNs) are widely used for lane detection. A typical method is lane semantic segmentation using fully convolutional networks (FCNs), which can classify each pixel in the image and accurately segment the pixel regions belonging to lane lines. To obtain the true position of the lane lines in the vehicle coordinate system, image projection to an inverted perspective (IPM) technique is also used to geometrically correct the segmentation results. In addition, there are traditional methods based on Hough transform or curve fitting, which extract lane line positions by detecting Hough lines or curves on image edges. However, these methods require high image quality and are prone to false positives or false negatives in poor lighting or when lane lines are blurred. In contrast, deep learning-based end-to-end lane detection models, such as SCNN and ENet-Lane, are better able to adapt to complex scenarios and have stronger generalization capabilities, but they require a large amount of labeled data for training and have higher requirements for computing resources.

LiDAR plays a crucial role in the 3D perception of road surfaces and obstacles. LiDAR rapidly samples the surrounding space through rotational or solid-state scanning, generating a series of 3D point clouds. The point cloud data first undergoes preprocessing steps such as filtering, downsampling, and clustering to remove noise points and redundant information. Subsequently, semantic segmentation is performed using deep learning frameworks based on Graph Convolutional Networks (GCN) and PointNet to distinguish different categories in the point cloud, such as roads, curbs, vehicles, pedestrians, and trees. In road recognition, ground segmentation is a critical step, aiming to distinguish drivable areas from non-drivable areas. Ground segmentation algorithms such as RANSAC fitting based on elevation thresholds or deep learning-based PointNet ground detection models can be used to separate road surface points from other points. Next, obstacle detection and clustering are performed on the remaining point cloud, classifying obstacle points into different instance objects to provide a basis for subsequent target tracking and path planning. To extract road edge information, point cloud data of the area in front of the vehicle can be combined. By finding transition points where the ground and protruding objects meet in the point cloud, a road edge curve can be fitted locally. For complex urban scenes, it is also necessary to further identify special features such as road intersections and turning ramps. By clustering and curve fitting the point cloud density distribution projected onto a bird's-eye view, the geometric relationships between multiple lanes can be extracted.

While millimeter-wave radar has low point cloud resolution, it possesses unique advantages in dynamic obstacle detection. By transmitting electromagnetic waves and measuring the Doppler shift and delay of the echo signals, it can directly calculate the distance, angle, and velocity vectors of target objects. In high-speed driving or highway scenarios, millimeter-wave radar can reliably detect moving targets such as vehicles and motorcycles at long distances (typically over 150 meters), providing early warnings for road identification. In practical applications, the point cloud output from millimeter-wave radar is often fused with the point cloud from lidar to find a balance between accuracy and real-time performance. For example, when a vehicle is about to enter a curve or encounters emergency braking ahead, the rapid warning function of millimeter-wave radar can trigger an emergency braking decision in advance, while lidar is responsible for map-level detailed modeling and extraction of the surrounding environment contours. To achieve cross-sensor data fusion, methods such as Kalman filtering, extended Kalman filtering (EKF), and unscented Kalman filtering (UKF) are often used to estimate the state of multi-source information. Through a state-space model, the detection results from cameras and lidar are continuously corrected, resulting in more stable and reliable road information.

High-definition maps and precise positioning: a "compass" for determining your own location.

Beyond perception algorithms, high-definition maps (HD maps) and precise positioning are crucial for ensuring accurate road recognition. HD maps contain rich and accurate geographic information, such as lane centerlines, dividing lines, road curvature, slope, intersection entrances, traffic signs, and traffic light locations, with an accuracy down to the centimeter level. After perceiving surrounding environmental elements, the autonomous driving system needs to match real-time perception data with the HD map to correct key information such as the vehicle's current lane, curve radius, and road topology.

Positioning technologies typically employ a combination of visual odometry (VO), lidar odometry (LOAM), inertial navigation systems (INS), and GNSS. The vehicle's onboard IMU provides high-frequency acceleration and angular velocity data, which is combined with the absolute position information output by GNSS. Through tightly coupled or loosely coupled attitude calculation, preliminary vehicle positioning can be obtained. Simultaneously, cameras or lidar scan the surrounding environment, extracting matching feature points such as building corners, road signs, and lane lines, and matching them with a pre-built HD map for positioning. For example, visual positioning algorithms based on optical features or ICP algorithms based on point cloud registration can be used to further correct GNSS/INS positioning errors, ensuring that the vehicle's lateral and longitudinal errors in the map coordinate system are controlled within 10 centimeters. Only with high-precision positioning can the system accurately determine the vehicle's lane and road geometry, providing a solid and reliable basis for subsequent route planning and decision-making.

Model training and validation: a continuously optimized "learning process"

Deep learning models for road semantic recognition require the collection and annotation of a large amount of high-quality data, and continuous iterative training in diverse scenarios to improve the robustness of the model under extreme conditions such as complex weather, changes in lighting, and road damage.

During the data acquisition phase, to ensure the trained model can adapt to various real-world road conditions, the collected data needs to cover multiple scenarios, including daytime, nighttime, rainy, foggy, and snowy conditions. Data annotation requires a professional annotation team to accurately label different lane marking styles (such as solid lines, dashed lines, double yellow lines, etc.) and to meticulously classify traffic signs (such as speed limits, no-entry, through, and directional signs). To enhance the model's generalization ability, researchers also employ data augmentation techniques, performing operations such as rotation, translation, color perturbation, and random occlusion on image data; for point cloud data, random downsampling, point cloud noise injection, and local geometric deformation can be applied. During the training phase, multi-task loss functions such as cross-entropy loss or Dice loss are often used to jointly optimize semantic segmentation and instance segmentation tasks. Furthermore, considering the stringent real-time requirements of autonomous driving systems in actual deployment, the model must undergo lightweight pruning, quantization, and knowledge distillation techniques to compress the original large network to a scale that can run in real time on in-vehicle computing units (such as NVIDIA DRIVE, Mobileye EyeQ, Tesla Dojo, etc.), while ensuring that the inference speed meets the real-time recognition requirement of less than 10 milliseconds.

Road recognition for autonomous vehicles is a complex and sophisticated systems engineering project, involving multiple key aspects such as sensor hardware, perception algorithms, high-definition maps and precise positioning, and model training and validation. Only when these aspects work closely together and cooperate can autonomous vehicles accurately identify roads in complex and ever-changing road environments, achieving safe and efficient autonomous driving. With continuous technological advancements and innovations, we have every reason to believe that road recognition technology for autonomous vehicles will continue to improve, bringing even greater changes to future intelligent transportation.

How do autonomous vehicles correctly identify roads?

Read next

CATDOLL 166CM An TPE

CATDOLL Coco 95CM TPE

CATDOLL 139CM Nonoka (TPE Body with Soft Silicone Head)

CATDOLL 148CM Sana (TPE Body with Hard Silicone Head)