1. The History of Machine Vision
Humans perceive the external world primarily through their sensory organs, including sight, touch, hearing, and smell, with sight being the most important. "See" means to look; "perceive" means to feel or sense. Statistics show that over 80% of the external information humans perceive is through sight. The saying "seeing is believing" vividly illustrates the importance of vision in acquiring information about the objective world. Through sight, we can perceive the position and brightness of objects, as well as the relationships between them, thus perceiving a wealth of information about the external world.
However, the human eye's perceptual capabilities are quite limited in the face of the vast and complex world around us. If we analyze the human eye using certain metrics from imaging devices, we find some interesting conclusions: the human eye has a resolution of approximately 16 million pixels, a frame rate of about 20fps, a dynamic range of about 14-16 bits, and a spectral response range of 400nm-700nm. Beyond these ranges, the human eye cannot accurately distinguish visual information.
For a long time, people have been striving to find products that can replace and compensate for the inherent limitations of the human eye, in order to expand human vision and enhance our ability to understand and change the world. As a result, a new discipline has gradually emerged—machine vision.
Machine vision, as the name suggests, uses machines to replace human eyes for measurement and judgment. The theoretical concept of machine vision emerged in the 1950s. In 1947, Bell Labs developed the first transistor, ushering in the era of the semiconductor industry. With the advent of solid-state imaging chips, from 1954 to 1956, all-transistor televisions and transistor computers were successively introduced, at which point display and data processing technologies could rudimentarily meet human needs. The development of semiconductor technology laid an important technological foundation for machine vision, and in the late 1950s, the idea of using machines to replace human vision was first proposed.
With the advent of integrated circuit televisions and integrated circuit computers in the 1960s, the world's first machine vision system was born in 1967. The system used closed-circuit television imaging to transmit video signals to electronic circuits for workpiece inspection. While interlaced scanning, analog signals, and fixed resolution seem quite outdated now, this marked the transition of machine vision from theory to reality.
In 1969, two scientists at Bell Labs in the United States , William Boyle and John Smith , invented the CCD image sensor chip. This epoch-making invention enabled people to obtain image information in the form of digital signals, bringing a huge leap to image technology.
As the 1970s approached, large-scale and ultra-large-scale electronic computers emerged one after another, and image technology was also developing rapidly. With all aspects of technology becoming relatively mature, machine vision entered a period of true development.
Starting in the 1980s, the personal computer industry took off, and machine vision technology matured and developed rapidly along with the updates and popularization of PCs. Today, machine vision has gone from semiconductor testing to penetrating all walks of life and is closely related to our lives.
| Smart security | Traffic technology | Traffic and trajectory analysis |
| Intelligent iris recognition | Dangerous event detection | Smart Logistics |
2. Characteristics of machine vision
Compared to human vision, machine vision has significant advantages:
Accuracy is limited by physiological conditions. The human eye has physiological limits in terms of spectral band, resolution, and speed; beyond these limits, the human eye cannot make accurate judgments. Machine vision, with its wide-area resolution and frame rate, has a clear advantage. Camera resolution ranges from the most common VGA resolution to tens of millions of pixels, and frame rates can be freely selected from single digits to hundreds of frames per second. Machine vision systems can be found in various working environments, whether under visible light or in special environments such as infrared and X-ray.
Stability varies widely among humans, and emotional changes can influence subjective judgment. Visual judgments made by the human eye often exhibit various errors depending on the individual, and even a fixed person's visual judgments will differ depending on their mood. Machine vision systems, composed entirely of mechanical components, ensure consistent results due to their inherent stability and objectivity.
Repetitive tasks – Humans experience physiological changes such as fatigue due to prolonged working hours and repetitive work content, which can affect vision. Machine vision systems, however, can maintain the same state during long periods of repetitive work, ensuring that the results meet the same standards each time.
In terms of speed, machines can easily reach levels of visual speed that are unattainable for humans. When working with moving objects, especially high-speed moving objects, such as on a production line, machine vision can significantly improve production efficiency by replacing human vision.
Low cost, fast machine operation, one machine can easily do the work of several people. In addition, the machine can work continuously without rest or leave, which can increase productivity while saving labor costs.
Non-contact machine vision systems do not come into contact with the work object during operation, and will not cause any damage to either object. Especially in some dangerous, harsh or high-precision work projects and environments, they can replace humans in vision-related tasks, reduce the risks of manual operation, and improve product precision.
3. Structure of Machine Vision System
Generally, a machine vision system consists of five main parts: a light source, a lens, a camera, an image acquisition card, and processing software. The light source, lens, and camera constitute the front-end imaging system, which determines the quality of the image. The acquisition card and processing software constitute the back-end image processing system, which is mainly responsible for algorithm research, software and hardware optimization techniques, and hardware processing techniques for analyzing and processing images.
The camera is the core component of the front-end imaging system, and the chip within the camera is the key factor determining its performance. The two main types of photoelectric sensor chips used in machine vision are CCD chips and CMOS chips. These two types of chips differ significantly in technology, but their function is the same: to convert light signals into electrical signals for storage and to obtain an image through the photoelectric effect. The imaging process for both follows the same steps:
1. Photoelectric conversion (converting incident light signals into electrical signals)
2. Charge collection (collecting and storing charge signals representing the energy of incident light in a specific form)
3. Signal conversion and output (CCD outputs analog signals, while CMOS can directly output digital signals)
Both CCD and CMOS chips consist of a pixel array with a metal-oxide-semiconductor (MOS) structure. Each pixel absorbs light intensity and converts it into a photoelectric charge signal. The difference between them is:
CCD chips use a readout node to convert the charge signals generated in all pixels into voltage. Therefore, it is necessary to transfer the charge in all pixels sequentially to the readout node for charge-to-voltage conversion and readout. | In a CMOS chip, the charge signal generated in a pixel is directly converted into a voltage signal inside the pixel without the need for charge transfer; each pixel directly outputs a voltage signal. |
Compared to CCD chips, CMOS chips have a simpler manufacturing process, lower cost, smaller size limitations, and lower power consumption. However, due to the independent charge-voltage conversion of each pixel in the CMOS chip structure, the differences between amplifiers in each pixel have historically resulted in CMOS chips having inferior image quality and higher noise compared to CCD chips, generally limiting their use in applications with lower image quality requirements. With continuous innovation in CMOS chip technology in recent years, CMOS chips have improved photoelectric sensitivity, reduced noise, expanded dynamic range, and enhanced image quality, with performance parameters gradually approaching those of CCD chips. Combined with their inherent advantages, CMOS chips are now finding increasingly widespread applications.
Vision software is another crucial component of the entire machine vision system. It primarily detects specific target features through the analysis, processing, and recognition of images. The widespread application of machine vision systems in modern industry has led to the rapid development of image processing software technology used in these systems.
Image processing and analysis tools primarily function to enhance images, facilitating subsequent recognition and understanding by specialized visual tools. Commonly used image processing and analysis tools include: histogram tools, filtering operations, morphological operations, contour extraction, geometric transformations, and color space transformations.
Professional vision tools are comprehensive vision tools specifically developed for the characteristics of machine vision applications. Their four main functions include: calibration tools, positioning tools, measurement tools, and inspection tools.
Calibration tools
The main function of calibration tools is to establish a mapping relationship between image pixel space and real physical space, realizing the transformation from image coordinates to spatial coordinates. Commonly used target types are checkerboard targets and dot matrix targets.
| Expected target | Matrix target |
Location tools
The purpose of localization tools is to determine the location of one or more pre-trained features in an image and to measure their quality.
| Regional positioning | Geometric positioning |
Measuring tools
The purpose of measuring tools is to accurately measure the 2D dimensions of objects using images, thereby enabling the inspection of product quality.
Measuring tools
Testing tools
The detection tool can efficiently generate a difference image between the real-time image and the template image, which is a very important part of defect detection.
Testing tools
In addition, professional vision tools also include 2D feature analysis tools, character recognition tools, barcode recognition tools, and color analysis tools.