Share this

Machine vision and vision sensors

2026-04-06 03:02:21 · · #1

Humans acquire various information from nature through their senses, with vision providing the largest share, accounting for approximately 80% of the total information . With the development of information technology, endowing computers, robots, and other intelligent machines with human-like visual capabilities has become a goal for scientists. Currently, machine vision technology has been commercialized and put into practical use, with related products such as lenses, high-speed cameras, light sources, image software, image acquisition cards, and vision processors becoming increasingly sophisticated. New technologies in the field of machine vision are emerging, and general-purpose three-dimensional real-time visual sensing technology will add another significant chapter to the field.

What is machine vision?

The Robotics Industries Association ( RIA ) defines machine vision as: "Machine vision is a device that automatically receives and processes images of a real object through optical devices and non-contact sensors to obtain the required information or to control the movement of a robot."

Machine vision is the use of machines to replace human eyes for measurement and judgment. A machine vision system refers to the process of converting the captured target into an image signal through machine vision products (i.e., image acquisition devices, which are divided into CMOS and CCD types), transmitting it to a dedicated image processing system, and converting it into a digital signal based on pixel distribution, brightness, color, and other information; the image system performs various operations on these signals to extract the target's features, and then controls the on-site equipment based on the judgment results.

Composition of machine vision system

Cameras and lenses

This part pertains to imaging devices. Typical vision systems consist of one or more such imaging systems. If there are multiple cameras, image data may be acquired by switching between image cards, or data from multiple camera channels may be acquired simultaneously by synchronous control. Depending on the application, the camera may output standard monochrome video ( RS-170/CCIR ), composite signals ( Y/C ), RGB signals, or non-standard progressive scan signals, line scan signals, high-resolution signals, etc.

light source

As auxiliary imaging devices, they often play a crucial role in the quality of imaging. Various shapes of LED lights, high-frequency fluorescent lights, fiber optic halogen lights, etc. are readily available.

sensor

They typically appear in the form of fiber optic switches, proximity switches, etc., and are used to determine the position and state of the object being measured, so as to inform the image sensor to perform the correct acquisition.

Image acquisition card

Typically installed in a PC as a plug-in card , the main function of an image acquisition card is to transmit images output from the camera to the computer host. It converts the analog or digital signals from the camera into an image data stream of a specific format. Simultaneously, it can control various camera parameters, such as trigger signals, exposure / integration time, and shutter speed. Image acquisition cards often have different hardware architectures to suit different types of cameras, and also utilize different bus types, such as PCI , PCI64 , CompactPCI , PC104 , and ISA .

PC platform

The computer is the core of a PC -based vision system, where image data processing and most of the control logic are handled. For inspection-type applications, a high-frequency CPU is typically required to reduce processing time. Furthermore, to minimize interference from electromagnetic fields, vibrations, dust, and temperature in industrial environments, industrial-grade computers must be selected.

Visual processing software

Machine vision software processes input image data and then performs certain calculations to produce results. These outputs may include PASS/FAIL signals, coordinate positions, or strings. Common machine vision software comes in the form of C++/C++ graphics libraries, ActiveX controls, and graphical programming environments. It can be for specific functions (such as LCD inspection, BGA inspection, template alignment, etc.) or for general purposes (including positioning, measurement, barcode / character recognition, spot detection, etc.).

Once the vision software completes image analysis (unless it's only used for monitoring), it needs to communicate with external units to control the production process. Simple control can directly utilize some of the image acquisition card's built-in I/O , while more complex logic / motion control must rely on additional programmable logic control units / motion control cards to implement the necessary actions.

Features of machine vision

Accuracy

Due to the physical limitations of the human eye, machines have a significant advantage in terms of precision. Even when the human eye relies on a magnifying glass or microscope to inspect products, machines are still more precise because their accuracy can reach one-thousandth of an inch.

Repeatability

Machines can perform inspections repeatedly in the same way without getting tired. In contrast, the human eye makes subtle differences each time it inspects a product, even if the products are exactly the same.

speed

Machines can inspect products much faster. Especially when inspecting high-speed moving objects, such as on a production line, machines can improve production efficiency.

objectivity

Human eye inspection has a fatal flaw: subjectivity due to emotions. The inspection results will vary depending on the worker's mood, while machines have no emotions, so the inspection results are naturally very objective and reliable.

cost

Because machines are faster than humans, one automated inspection machine can perform the tasks of several people. Moreover, machines do not need to stop, do not get sick, and can work continuously, thus greatly improving production efficiency.

Working principle of vision sensors

A visual sensor is a sensor that calculates the characteristic quantities (area, center of gravity, length, position, etc.) of an object by processing images captured by a camera, and outputs data and judgment results.

Visual sensors have thousands of pixels that capture light from an entire image. The sharpness and detail of an image are typically measured by resolution, expressed in the number of pixels. Therefore, a sensor can "see" a very detailed image of a target, whether it is several meters or several centimeters away .

After capturing an image, the DINA vision sensor compares it with a baseline image stored in memory to perform analysis.

The vision sensor is the core of a machine vision system and the source of the most information about the environment. It must accommodate various optical, mechanical, electronic, and sensor components for contour measurement, while also being small in size and light in weight.

Visual sensors include lasers, scanning motors and scanning mechanisms, angle sensors, linear CCD sensors and their drive boards, and various optical components.

The Development History of Visual Sensors

Visual sensors emerged in the late 1950s and have developed rapidly, becoming one of the most important sensors in robotics. Starting in the 1960s, robot vision initially processed the world of building blocks, later expanding to handle indoor objects such as tables, chairs, and lamps, and subsequently the outdoor real world. After the 1970s , some practical vision systems appeared, such as those used in integrated circuit manufacturing, precision electronic product assembly, and inspection and positioning in beverage canning and packaging. Furthermore, with the development of this discipline, some advanced ideas have emerged in fields such as artificial intelligence, psychology, computer graphics, and image processing.

Machine vision's role is to obtain necessary information from 3D environment images and construct a clear and meaningful description of the observed object. Vision includes three processes: image acquisition, image processing, and image understanding. Image acquisition converts 3D environment images into electrical signals using a vision sensor; image processing refers to image-to-image transformations, such as feature extraction; and image understanding provides an environmental description based on the processed data. The core component of a vision sensor is a camera tube or CCD , with camera tubes being an early product. OzD ( Optical Characteristic Display) technology was developed later. Current CCDs can achieve automatic focusing.

Implementation methods of vision sensors

Vision sensors are non-contact. They integrate technologies such as television cameras and are among the most stable sensors used in robots.

Robot vision sensors have the following three measurement methods

I. Image processing method that directly processes the light and dark images captured by a television camera, which are divided into six brightness levels. The brightness information is digitized, typically to about 4-10 bits, as an output processing part of 64 × 64-1024 × 1024 pixels. Then, various known algorithms are used to interpret the lines and identify the workpiece. The difficulty of this image processing method is that it requires processing a huge amount of output data, which is very time-consuming. For robot vision, it is often simplified to a binary value and then processed quickly using a dedicated processing device.

II. A method of binarizing and reprocessing images of varying depths.

III. Methods for measuring the switching and position of objects based on distance information. This method employs various approaches, including triangulation and stereoscopic vision using two television cameras.

1. Triangulation Measurement Principles and Methods

A laser beam is projected onto an object, and a source-sensitive device detects its diffuse reflection light, as shown in Figure 1.

If the linear array sensor (such as a linear CCD ) is positioned appropriately, the laser points on the object can be clearly imaged on the sensor. In this case, the lateral resolution depends only on the width (i.e., thickness) of the laser beam, which can be adjusted to be thinner using appropriate optical methods. Two methods can be used to obtain depth and lateral information.

One method involves expanding the light beam into a flat surface and projecting it onto the object, where it is received by an array of digital sensors. For rapid distance measurement, a strip of light obtained through a vertical slit is projected onto the object being processed, and then a television camera detects the image of the slit. Figure 2 illustrates this structure. If the projection direction of the slit light and the slit image are known, the distance to the object's surface can be determined using the principle of triangulation. Using a television camera with 256 scanning lines, the distances of 256 points can be obtained every 1 / 60th of a second , enabling the measurement of distances to most points within the television camera's frame.

 

Secondly, the emitted and received light beams rotate synchronously around the axis shown by the dotted line. The linear array sensor measures the radial depth information, while the angle sensor, along with the scanning mechanism, measures the angle information. The advantage of this method is that it can achieve a high signal-to-noise ratio because the light intensity is concentrated at a point rather than scattered along a line during the photosensitive time of the sensor, enabling all photoelectric signals to be clearly imaged on the sensor. At the same time, the signal processing speed is relatively fast.

Thirdly, a laser tracker is used to project lasers in any direction, using the laser point on the object's surface as the brightest point in the image. A camera is then used for detection, and based on the principle of triangulation, distance can be measured quickly. The mechanical and optical structures based on the laser synchronous scanning measurement principle have been optimized, and some micro-components have been used to construct a robot design sensor, the structural principle of which is shown in Figure 3.

 

2. Stereoscopic vision method

The eye is like a sophisticated and efficient information processor, handling over 90% of the information entering the brain from the outside world. Although humans have created remarkable "artificial eyes" such as cameras, telescopes, and photoelectric tubes (Figure 3 shows the principle diagram of a sensor structure), science still needs to learn from the primitive structure of the biological world—the eye—to further explore its mysteries and utilize its advantages for human benefit.

1 ) Mechanism of stereoscopic vision

In the visual cortex of the brain, some cells respond to stimuli from both eyes; these responsive cells are called binocular cells. Most cells in the visual cortex are binocular cells, and these cells generally have receptive fields with almost identical characteristics for both eyes. The positions of the receptive fields on the left and right retinas of binocular cells, when projected onto the visual field, are not entirely identical; each cell shifts slightly in turn. This type of receptive field plays a crucial role in the extraction of stereoscopic information from both eyes. As shown in Figure 4, there are three binocular cells A , B , and C , whose receptive fields have different degrees of difference. When a stimulus is applied to the screen shown, the center of these receptive fields overlaps at a single point on the left retina, but each has a different position on the right retina. In other words, the receptive fields of the left eye are on the same axis, while the receptive fields of the right eye are on three different visual axes. There are also some binocular cells that hardly respond to monocular stimuli, but their facilitating effect on the response to binocular stimuli is very significant, and they only produce output when the displacement of the binocular stimulus is appropriate. These cells can only react strongly in three-dimensional space when light stimuli are placed at a specific location away from the eye; therefore, they can be called binocular depth detection cells.

 

The biological visual system possesses a large number of photosensitive elements, namely retinal rod cells and cone cells. Since the optic nerve fibers have fewer photosensitive elements than the retina, a one-to-one correspondence between the two is not possible. Because visual information undergoes some form of parallel processing before reaching the nerves, this preprocessing is achieved by a real-time response system that rapidly extracts the key features of a graphic.

2 ) Hierarchical model

Many people have proposed hierarchical models of information processing, the most famous being the hierarchical model proposed by Marko.

The first layer is the sensory layer;

The second layer consists of four parallel-processed directional filters, which are extracted in the vertical, horizontal, and two diagonal directions respectively.

The third layer contains a combination of three parallel line filters, which respectively produce detectors for curvature, angle, endpoints, and intersections;

In the fourth layer, the so-called topological transformation extraction compresses the image surface into a point, which represents a certain overall feature of the image.

It can be seen that the hierarchical model considers both the serial structure of the hierarchical structure and the parallel structure of each pathway, thus binocular vision has a powerful parallel processing capability.

Since external scenes are three-dimensional, it is desirable for intelligent robots' external sensors to provide three-dimensional stereoscopic information about the external scene. To achieve this, researchers have studied the acquisition of three-dimensional stereoscopic information from different perspectives, employing stereoscopic vision methods to obtain three-dimensional depth images of the external scene.

3) 3D vision system and mathematical model structure

 

The 3D vision system consists of a CCD area array camera, a structured laser projector, an image interface, image processing and analysis software, and a PC . The vision sensor comprises the camera and the structured laser projector, both rigidly fixed. The structured laser projector generates optical modes with five different light surfaces in spatial directions: 20y , X10lyl , and zZl .

Comparison of photoelectric sensors and vision sensors

Compared to photoelectric sensors, vision sensors offer machine designers greater flexibility. Applications that previously required multiple photoelectric sensors can now be inspected using a single vision sensor, examining multiple features. Vision sensors can inspect much larger areas and achieve greater flexibility in target position and orientation. This has made vision sensors widely popular in applications that were previously only possible with photoelectric sensors. Traditionally, these applications also required expensive accessories and precise motion control to ensure that the target object always appears in the same position and orientation.

Furthermore, since the cost of a basic vision sensor is only equivalent to that of several photoelectric sensors with more expensive components, price is no longer an issue.

Vision sensors offer unparalleled flexibility for application switching. For example, switching production processes (from single-serving yogurt to ice cream tubs) can take only seconds and can be done remotely. Additional inspection conditions can be easily added to this application.

Applications of machine vision and vision sensors

Machine vision can excel in any task requiring object recognition, feature analysis, and detection. Today, in agriculture, industry, and medicine, machine vision technology is widely used due to its outstanding advantages such as non-contact operation, high speed, high precision, and strong resistance to interference.

In recent decades, machine vision has been widely used in agriculture, industry, medicine, and other fields due to its outstanding advantages such as non-contact operation, high speed, high precision, and strong resistance to on-site interference. Wherever it is necessary to identify, determine features, and detect objects, machine vision can demonstrate its capabilities, completing tasks quickly and efficiently.

For example, in agricultural production, some tasks involve judging the appearance of crops or agricultural products, such as fruit quality inspection, fruit ripeness assessment, crop growth status, and weed identification. These tasks, which previously relied primarily on human vision, can be partially or completely replaced by machine vision technology, thereby achieving agricultural automation and intelligence. For instance, Huang Xiuling's team from Nanjing Forestry University designed an intelligent grading production line capable of dynamically and in real-time detecting apple quality. On the production line, three evenly distributed cameras simultaneously collect information from the apple surface. A computer intelligent control system then comprehensively analyzes the collected information to grade the apples. However, some experts have pointed out that due to the complex and variable nature of farmland environments and their unstructured characteristics, the application of machine vision in agricultural production is still immature and requires further improvement.

In industrial environments, machine vision applications are becoming increasingly sophisticated, playing a significant role in improving the flexibility and automation of industrial production. Furthermore, in hazardous working environments or situations where human vision is insufficient, replacing human vision with machine vision enhances operational safety. Image recognition systems on assembly lines, using image recognition technology to inspect product appearance defects, label printing errors, and circuit board soldering quality defects, are successful examples of machine vision systems applied in the industrial field. Printing and packaging, the automotive industry, semiconductor materials, and food production are all areas where machine vision is being used in industry.

Machine vision technology also has significant applications in exploration, mining, and non-ferrous smelting processes. Mineral processing is a crucial step in mineral resource processing, and its quality directly impacts mineral resource recovery. In recent years, machine vision-based mineral surface feature monitoring technology has attracted considerable attention from research institutions in industrialized countries. Data shows that the European Union, in collaboration with several universities and companies, launched the "Machine Vision-Based Bubble Structure and Color Characterization" project in 2000 ; South Africa, Chile, and other countries have also applied machine vision to the flotation monitoring of graphite and platinum. Domestically, significant progress has also been made in research on the flotation monitoring of coal and nickel.

Machine vision technology can also be applied to intelligent transportation, security, and medical equipment. In the medical field, machine vision can assist doctors in analyzing medical images, such as X -ray images, MRI images, and CT images. In scientific research, machine vision can be used for materials analysis, biological analysis, chemical analysis, and life science analysis, such as automated blood cell classification and counting, chromosome analysis, and cancer cell identification.

Read next

Human-Machine Interface Prototyping Strategies for Embedded Systems

To simulate a human-machine interface (HMI) before the target hardware is completed, design engineers need to build an H...

Articles 2026-02-22