Vision is the primary source of signals for humans to perceive the objective world. After the advent of signal processing theory and computers, researchers attempted to use cameras to capture environmental images and convert them into digital signals, then use computers to process this visual information. This research goal—enabling computers to recognize information about their surroundings through a single image or two—is precisely machine vision.
As a representative application of machine vision, video surveillance systems centered around cameras are now widely used in various industries such as security, transportation, building management, and manufacturing. To understand exactly how machine vision has changed video surveillance, we can further analyze it from its major application areas.
Target recognition
Target recognition technology and stable tracking methods are among the key factors in the development of machine vision. It has been widely applied in many fields, such as fingerprint recognition, facial recognition, and iris recognition for identity verification, as well as license plate recognition in intelligent traffic management, vehicle inspection, and parking management.
A target recognition system should be able to detect, classify, and identify targets in complex backgrounds and various weather conditions, so as to conduct targeted and continuous tracking of targets.
In recent years, target recognition technology has gradually moved from theoretical exploration and laboratory simulation to practical application. Its technical methods have also evolved from classic statistical pattern recognition to recognition methods based on knowledge, models, multi-sensor information fusion, and artificial neural networks.
Target tracking
Moving target tracking is the process of determining the position of the same object in different frames of an image sequence. Its main working principle involves selecting good target features and employing appropriate search methods. Based on matching principles, existing tracking methods can be categorized into model-based, region-based, feature-based, and contour-based tracking.
Model-based tracking
Model-based tracking builds a model of the target object using prior knowledge, then matches it to a tracking template and updates the model in real time. Traditional methods for representing moving objects include the following three:
1. Line graph method: The essence of the target's motion is the motion of the main frame, so this method approximates the various parts of the object with straight lines.
2. Two-dimensional contour: The use of this representation method is related to the projection of the object onto the image.
3. Stereoscopic Models: This method utilizes generalized 3D models such as elliptical cylinders and spheres to describe the structural details of objects. It often requires matching 3D models across related image frames to obtain a quantitative description of the object's motion, thus necessitating the calculation of more parameters and increasing the computational load of the matching process.
Feature-based tracking
By using the pre-extracted motion region as the target template for matching, setting a matching metric, and then searching for the target image in the next frame, the position where the metric reaches its extreme value is determined as the best matching point. This method is called region of chance tracking.
Because it extracts a more complete target template, this method can obtain richer image information compared to other tracking algorithms, and is therefore widely used for tracking smaller targets or targets with poor contrast.
Activity contour-based tracking
This method uses a closed parametric curve to represent the contour of a moving target. It dynamically iterates by minimizing the energy of the curve function within a feature field constructed from the image, allowing the contour to be updated automatically and continuously. Compared to region tracking methods, this approach has lower computational cost. If each moving target can be reasonably separated and its contour initialized initially, continuous tracking is possible even with partial occlusion.
Visual analysis
Visual analytics technology involves identifying and tracking targets to obtain information such as their appearance time, movement trajectory, and color. By analyzing this information, it identifies dangerous, illegal, or suspicious targets in videos and provides real-time alerts, early warnings, storage, and post-event retrieval for these behaviors and targets.
In the application of visual analytics, the most important technologies are intelligent video surveillance and intelligent video retrieval. While their application technologies are similar, the main difference lies in their processing methods: intelligent video surveillance processes captured video in real time, triggering alarms when dangerous events or suspicious individuals are detected; whereas intelligent video retrieval technology processes stored video footage. Through rapid analysis, it identifies dangerous events, suspicious individuals, and information on targets of interest. Users can then select events of interest or define target attributes, allowing the system to quickly locate the desired events or targets.
Generally speaking, intelligent video surveillance includes functions such as perimeter detection, line crossing detection, loitering detection, loss detection, abandoned object detection, fast movement detection, fighting detection, tailgating detection, crowd gathering detection, fire and smoke detection, PTZ target tracking, video fault analysis, video storage and playback, etc.
Different users may have varying needs for the aforementioned functions. Among the technologies described above, the methods used for perimeter detection, line crossing detection, loitering detection, loss detection, abandoned object detection, fast movement detection, fighting detection, and tailing detection primarily involve first extracting moving targets using background modeling and foreground extraction. Then, target matching and tracking techniques are used to obtain the target's trajectory, its direction of movement, location, and relationships between targets. Finally, the aforementioned abnormal behaviors are identified based on predefined rules.
For the detection of abandoned and lost items in complex backgrounds and high-volume areas, a special method based on time-series regional motion analysis can be used, without the need for the aforementioned target detection and tracking techniques.
Intelligent video retrieval first requires utilizing the detection technology of intelligent video surveillance to detect abnormal events. Furthermore, based on moving target detection and tracking, intelligent video retrieval also needs to obtain information such as faces, colors, speeds, and quantities of targets like people and vehicles. In this way, intelligent video retrieval can not only search for abnormal events but also perform searches based on the appearance and end times, colors, speeds, quantities, and facial information of targets.
In addition, the system can provide a spatiotemporal distribution map of events and targets, making it easier for users to find time periods and events of interest. With tens of thousands of monitoring terminals nowadays, finding the events and targets of interest from this massive amount of data requires the assistance of intelligent video retrieval technology.
Conclusion
Video surveillance technology is an emerging application area and a cutting-edge topic of great interest in the field of machine vision . It is also the culmination of multiple disciplines such as computer science, machine vision, image engineering, pattern recognition, and artificial intelligence.
It's conceivable that the integration of machine vision and image processing technologies would break down existing limitations, enabling the design of a real-time video surveillance system. This system, while providing video monitoring, utilizes machine vision technology to add video change detection and automatic recording functions. The system can automatically identify scene changes, detect and lock onto moving targets, and simultaneously issue warnings and activate the storage device. This not only saves significant storage space and improves monitoring storage efficiency, reducing unnecessary playback, but also makes the data more targeted.
For more information, please follow the Machine Vision channel.