introduction
With the rapid development of my country's automobile industry, the number of motor vehicles has been increasing year by year, and the serious harm caused by road traffic accidents to human life and property has become increasingly prominent. The World Health Organization's "Global Status Report on Road Safety 2013" points out that approximately 1.24 million people die from road traffic accidents worldwide each year, making road traffic injuries one of the eighth leading causes of death globally.
To improve road traffic safety, numerous research institutions and automotive companies both domestically and internationally have invested heavily in the research and development of automotive safety systems. The scope of research has evolved from early mechanical and electronic devices to today's focus – Advanced Driver Assistance Systems (ADAS).
ADAS systems utilize various sensors in their hardware, such as ultrasonic sensors, vision sensors, radar, and GPS, to perceive the vehicle's own status and environmental changes during driving, collect vehicle and environmental data, and based on this data, identify traffic scenarios, predict traffic events, and provide corresponding driving suggestions and emergency measures to assist drivers in making decisions, avoid traffic accidents, and reduce the damage caused by accidents.
In actual driving, drivers obtain the vast majority of information from vision, such as road conditions, traffic signs, markings and signals, and obstacles. Studies show that approximately 90% of environmental information comes from vision. Therefore, effectively utilizing visual sensors to understand the road environment is a good way to achieve vehicle intelligence. Vehicle driver assistance systems based on vision navigation, including traffic sign detection, road detection, pedestrian detection, and obstacle detection, can reduce driver workload, improve driving safety, and reduce traffic accidents.
In providing decision-making suggestions to drivers, driver assistance systems utilize a large amount of visual information data, in which visual images have an unparalleled advantage:
Visual images contain a large amount of information, such as distance information of objects within the visible range, object shape, texture and color, etc.
The acquisition of visual information is non-contact, does not damage the road surface and the surrounding environment, and does not require large-scale reconstruction of existing road facilities.
A single acquisition of a visual image can simultaneously perform multiple tasks such as road detection, traffic sign detection, and obstacle detection.
There will be no interference between vehicles during the acquisition of visual information.
In conclusion, intelligent vehicle machine vision technology has broad application prospects in intelligent transportation, automotive safety assistance driving, and autonomous driving.
1. Application of machine vision in advanced driver assistance systems
Currently, visual sensors and machine vision technology are widely used in various advanced driver assistance systems. Among them, the perception of the driving environment is a crucial component of advanced driver assistance systems based on machine vision.
Perception of the driving environment primarily relies on visual technology to sense road information, traffic conditions, and driver status while the vehicle is in motion, providing the necessary foundational data for decision-making by driver assistance systems. Among these,
Road information mainly refers to static information outside the vehicle, including: lane lines, road edges, traffic signs and traffic lights, etc.
Road condition information mainly refers to dynamic information outside the vehicle, including obstacles, pedestrians, and vehicles ahead of the vehicle.
Driver status is considered in-vehicle information and mainly includes driver fatigue and abnormal driving behavior. By alerting the driver to potential unsafe behaviors, it helps prevent vehicle accidents.
By using machine vision technology to perceive the driving environment, various static and dynamic information inside and outside the vehicle can be obtained, helping the driver assistance system to make decisions.
Based on the above classification, it can be seen that the key technologies of advanced driver assistance systems based on machine vision that are currently widely used include: lane detection technology, traffic sign recognition technology, vehicle recognition technology, pedestrian detection technology, and driver state detection technology.
1.1 Lane line detection technology
Current research on lane detection technology mainly focuses on two aspects: equipment and algorithms. Data acquisition for lane detection technology relies on various sensor devices, such as LiDAR, stereo vision, and monocular vision. The acquired information needs to be matched with suitable algorithms, such as model-based methods and feature-based methods, for computation and decision-making.
The machine vision principle of LiDAR is to identify roads based on the fact that different colors or materials have different reflectivities.
Stereo vision is more accurate than LiDAR, but it is difficult to achieve image matching, the equipment cost is high, and the real-time performance is poor due to the complexity of the algorithm.
Monocular vision is mainly implemented through feature-based, model-based, fusion, and machine learning methods, and is currently the most mainstream method for lane line recognition.
Feature-based algorithms first extract image features, such as edge information. Using this feature information, lane markings are obtained according to predetermined rules. For example, Lee et al. proposed a feature-based lane detection method in 2002. They used an edge distribution function to statistically analyze the global gradient angle accumulation and find the maximum accumulation value. Combining this with the symmetry of the left and right lane lines, they determined the position of the lane line. The main advantage of this type of algorithm is that it is insensitive to the shape of the lane line and has good robustness even under strong noise interference (such as shadows, worn lane markings, etc.), and can reliably detect the straight line model of the lane line.
In 2010, Lopez et al. proposed a method to extract lane line feature data using image ridges instead of image edge information. Ridges reflect the convergence of neighboring pixels in an image; in the lane marking region, they are represented as bright areas with local maxima in the middle of the lane line. Compared to image edges, ridges are more suitable for lane line detection applications.
Model-based lane recognition methods utilize mathematical thinking to build road models, analyze image information to obtain parameters, and thus complete lane detection. Shengyan Zhou et al. proposed a lane recognition method based on Gabor filters and geometric models. Given the presence of lane markings ahead of the intelligent vehicle, the lane can be described using four parameters: origin, width, curvature, and starting position. The camera is pre-calibrated, and after calculating the model parameters, several lane models are selected. The algorithm estimates the required parameters through local Hough transform and region localization, determines the final model to use, and completes the matching with the actual lane lines.
Generally speaking, model-based lane line recognition methods are mainly divided into simple straight line models and more complex models (such as quadratic curves and spline curves). In practical applications, different methods need to be selected according to the specific application and road characteristics. For example, most lane departure warning systems use simple straight line models to depict lane lines; while in situations where flexible fitting of lane lines is required, such as lane line prediction and tracking problems, more complex model algorithms are usually used.
1.2 Traffic Sign Recognition Technology
Traffic sign recognition can alert drivers to traffic signs in the road environment, helping them make correct decisions and improving driving safety. Traffic signs typically have distinctive visual features, such as color and shape. These visual features can be used to detect different traffic signs. In the literature on traffic sign detection methods, detection methods combining color and shape features are widely used. However, in reality, the quality of traffic sign image data can be affected by factors such as lighting and weather changes; furthermore, occlusion, distortion, and wear of traffic signs can also affect the accuracy of the algorithm.
Most current traffic sign recognition technologies rely on image segmentation by setting threshold ranges for color components. This allows for the extraction of regions of interest (ROIs) from complex backgrounds, followed by shape filtering within these ROIs to detect the location of traffic signs. A common algorithm is direct color thresholding, which segments all pixels in the RGB color space and uses corner detection to determine the presence of traffic signs. However, this algorithm is ineffective in handling lighting effects and occlusion. Therefore, many researchers have improved the algorithm, often by converting the RGB image to HSV or HIS color models that better align with human color perception before segmentation and extraction. This effectively overcomes the challenges of lighting effects and occlusion associated with traffic signs.
The most representative application of traffic sign recognition technology is in Intelligent Transportation Systems (ITS). In 2010, the University of Massachusetts developed the TSR system, which uses a color threshold segmentation algorithm and principal component analysis for target detection and recognition. The system achieves a recognition accuracy of up to 99.2%. The algorithm performs well even with slight target occlusion and low visibility conditions, demonstrating robustness and applicability. The processing speed is 2.5 seconds per frame. The main drawback of the system is its inability to meet real-time requirements.
In 2011, Germany hosted the Traffic Sign Recognition Competition (IJCNN2011), which spurred the rapid development of research on traffic sign detection and recognition. In 2011, Ciresan et al. achieved a higher recognition rate than the average human score by using a deep convolutional neural network to recognize the GTSRB database at the IJCNN competition.
In 2012, Greenhalghd et al. selected the maximum values of the R and B channels in the normalized RGB space and extracted the MSER region from the RGB image, then used SVM for traffic sign detection. This method showed good real-time performance. In 2013, Kim J.B. argued that color and shape are easily affected by the surrounding environment and added a visual saliency model for traffic sign detection, achieving high real-time performance.
1.3 Vehicle Recognition Technology
In the field of vehicle recognition technology, many experts and scholars are currently researching multi-sensor fusion technology. This is because it becomes increasingly difficult for a single sensor to detect vehicles in complex traffic environments, and different vehicles have different shapes, sizes, and colors. In the context of occlusion, clutter, and dynamic changes among objects, multi-sensor fusion can achieve a complementary effect, which is the development trend of vehicle recognition technology.
Radar has significant advantages in detecting the position, speed, and depth of obstacles in front of a vehicle. Types mainly include lidar, millimeter-wave radar, and microwave radar, with lidar further categorized into single-line, four-line, and multi-line. Based on visual information from onboard cameras, stereo or monocular vision can be used to detect the external environment. Stereo vision detection aims to acquire depth information of obstacles; however, in practical applications, the large computational load makes it difficult to guarantee real-time performance at high speeds. Furthermore, due to vehicle vibrations, the calibration parameters of binocular or multi-view cameras often have significant deviations, resulting in numerous false positives and false negatives. Monocular vision has a significant advantage in real-time performance and is currently the most commonly used detection method. It mainly includes detection methods based on prior knowledge, motion-based detection methods, and statistical learning-based detection methods.
Prior knowledge-based detection methods extract certain vehicle features as prior knowledge. The principle is similar to feature-based detection algorithms in lane detection technology. Common vehicle features used as prior knowledge include: vehicle symmetry, color, shadows, edge features, and texture features. This method searches the image space to find regions that match the prior knowledge model, i.e., regions of interest (ROIs) where vehicles may exist. Machine learning methods are often used to further confirm the identified ROIs.
Motion-based detection methods address the issue that image information generated by moving objects varies across different real-world environments. Therefore, processing multiple significantly different images to accumulate sufficient information is typically necessary for object recognition and obstacle detection. However, this method suffers from computational limitations and poor real-time performance in practical applications. Optical flow is a commonly used motion-based detection method in machine vision and pattern recognition. It leverages the variation in grayscale distribution of image pixels within a plane to establish a coordinate system and detect and acquire obstacle positions.
Statistical learning-based detection methods first require collecting a sufficient number of vehicle samples from the road ahead, covering various environmental conditions, weather, and distances. During the training of the sample data, methods such as neural networks and Haar wavelets are typically employed. Once trained, the data can be applied to the specific functions to be implemented.
1.4 Pedestrian Detection Technology
Pedestrian detection technology has certain unique characteristics compared to current intelligent driving assistance technologies. This is primarily because pedestrians possess both rigid and flexible properties, making detection susceptible to factors such as pedestrian behavior, clothing, and posture. Pedestrian detection technology involves extracting pedestrian positions from images acquired by sensors and judging pedestrian movement. This is achieved by extracting information from moving target areas in videos and using methods such as background subtraction, optical flow, and frame differencing, combined with features like body shape and skin color. For static images, the main methods used include template matching, shape detection, and machine learning-based detection. Due to significant drawbacks, the first two methods have seen limited practical application in recent years. This paper focuses on the current development status of machine learning-based detection methods.
Performance improvements in machine learning-based pedestrian detection methods primarily rely on pedestrian feature description and classifier training. The complexity of feature description affects the real-time performance of the detection method. HOG is a widely used pedestrian feature description method, while Haar, LBP, and their improvements are also commonly used. The machine learning classifier is related to the detection rate of pedestrians; neural networks, support vector machines, and Boosting methods are common machine learning classifiers.
Many pedestrian detection algorithms are based on the methods mentioned above and their improvements, thus optimizing pedestrian detection techniques in various aspects. Taking the combination of HOG and Linear Vector Machine (SVM) as an example, HOG characterizes the local gradient magnitude and direction features of an image. Based on gradient features, it normalizes the feature vectors of blocks and allows overlap between blocks, making it insensitive to changes in lighting and small offsets, effectively characterizing the edge features of the human body. In tests on the simple MIT pedestrian database, this combination of HOG features and SVM achieved a detection rate of nearly 100%.
1.5 Driver Status Detection Technology
Early driver state detection methods primarily relied on vehicle operational status detection, including lane departure warnings and steering wheel detection. These methods are not highly sensitive to driver characteristics and are prone to misjudgment due to environmental factors; therefore, they are rarely used alone in recent research. This paper will introduce driver state detection techniques based on driver facial features, as well as techniques that integrate these features with multi-sensor fusion.
Currently, the most commonly used detection technology based on driver facial features is the driver's head features. The visual features of the driver's head can reflect the driver's mental state, such as the blinking state and frequency of the eyes, mouth movement features, head posture, etc. These features can be collected by cameras and will not affect the driver's normal driving. This non-contact method has gradually become the mainstream method of this type of technology.
FaceLAB is a representative of driver state detection technology based on eye features. This technology fuses multi-feature information by detecting driver head posture, eyelid movement, gaze direction, pupil diameter, and other feature parameters to achieve real-time detection of driver fatigue. The system employs eye opening/closing and gaze direction detection methods, solving the problem of eye tracking under low light, head movement, and driver wearing glasses conditions. In 2008, the latest version, FaceLAB0v4, adopted leading infrared active illumination technology, further enhancing the accuracy and precision of eye detection, and enabling independent tracking of each eye.
The detection technology based on driver facial features and multi-sensor fusion is mainly represented by the EU project "AWAKE". This project uses multiple sensors such as images and pressure to analyze the driver's driving status, such as eyelid movement, gaze direction, steering wheel grip force, lane tracking, surrounding vehicle distance detection, throttle acceleration and brake usage. It divides the driver's fatigue level into three states: alert, possibly fatigued, and fatigued, and conducts a more comprehensive detection and evaluation of the driver's condition.
The driver alert system in this project consists of sound, visual, and tactile alarms. When fatigue is detected, it can increase the driver's alertness by using sound and light stimuli of varying intensity and seatbelt vibration, depending on the degree of fatigue. Based on this research, Nissan has developed an alarm system that, when it determines that the driver is in a state of fatigue, will sound an electronic alarm and spray an aroma containing stimulants such as mint and lemon into the driver's cabin to promptly eliminate drowsiness. If the driver's fatigue does not improve, the system will use sound and light alarms and automatically stop the vehicle.
2. Conclusion
The automotive industry has entered the era of intelligent technology, and machine vision is used in many automotive driver assistance technologies. Technological advancements in machine vision will undoubtedly drive the development of these technologies. Therefore, improving image acquisition quality, optimizing image processing algorithms, and more quickly achieving intelligent image generation, processing, and recognition to provide decision-making suggestions are all important issues that the field of machine vision needs to address.
In the future, with technological innovations in various sensors and reductions in the complexity of image processing algorithms, machine vision technology will better meet the real-time and accuracy requirements during driving.
References
[1] Zhu Mingzhu, Zhao Yun, Shen Ying. An overview of the application of vehicle-mounted machine vision [J]. Electromechanical Technology, 2014, 37(1):50~52.
[2]LeeJW.AMachineVisionSystemforLane-DepartureDetection[J].ComputerVisionandImageUnderstanding, 2002, 86(1):52~78.
[3]LopezA,SerratJ.RobustLaneMarkingsDetectionandRoadGeometryComputation[J].InternationalJournalofAutomotiveTechnology, 2010, 11(3):395~407.
[4]ZhouSY, JiangYH. ANovelLaneDetectionBasedonGeometricalModelandGaborFilter[C].USA:2010IEEEIntelligentVehiclesSymposiumUniversity of California, 2010.
[5]FleyehH.Colordetectionandsegmentationforroadandtrafficsigns[C].IEEEConferenceonCyberneticsandIntelligentSystems, 2004, 2:809~814.
[6]BaehringD,SimonS,NiehsenW,etc.DetectionofCloseCut-inandOvertakingVehiclesforDriverAssistanceBasedonPlanarParallax[C].USA:ProceedingsofIEEEIntelligentVehiclesSymposium, 2005.
[7] Su Songzhi, Li Shaozi, Chen Shuyuan. A review of pedestrian detection technology [J]. Acta Electronica Sinica, 2012, 40(4):814~820.