Analyzing the Five Major Challenges in Machine Vision System Design

Industrial vision applications are generally divided into four categories: positioning, measurement, detection and recognition, among which measurement has the highest requirements for the stability of lighting.

Composition of machine vision system

Machine vision systems refer to systems that use computers to perform human visual functions, that is, to use computers to recognize the objective three-dimensional world. As currently understood, the sensory part of the human visual system is the retina, which is a three-dimensional sampling system. The visible portion of a three-dimensional object is projected onto the retina, and people use this two-dimensional image projected onto the retina to form a three-dimensional understanding of the object. This three-dimensional understanding refers to the perception of the observed object's shape, size, distance from the observation point, texture, and motion characteristics (direction and speed).

The input devices for machine vision systems can be cameras, rotating drums, etc., all of which take three-dimensional images as input sources. That is, what is input into the computer is a two-dimensional projection of the three-dimensional visual world. If we consider the transformation from the three-dimensional objective world to a two-dimensional projected image as a forward transformation, then what the machine vision system needs to do is to perform an inverse transformation from this two-dimensional projected image to the three-dimensional objective world, that is, to reconstruct the three-dimensional objective world based on this two-dimensional projected image.

A machine vision system mainly consists of three parts: image acquisition, image processing and analysis, and output or display.

Nearly 80% of industrial vision systems are primarily used for inspection, including improving production efficiency, controlling product quality during production, and collecting product data. Product classification and selection are also integrated into inspection functions. The following example of a single-camera vision system used on a production line illustrates the system's composition and functions.

The vision system inspects products on the production line, determines whether they meet quality requirements, and generates corresponding signals to input into the host computer based on the results. Image acquisition equipment includes light sources, cameras, etc.; image processing equipment includes corresponding software and hardware systems; output devices are related systems connected to the manufacturing process, including process controllers and alarm devices. Data is transmitted to the computer for analysis and product control. If a non-conforming product is detected, an alarm is triggered, and the product is removed from the production line. The results of machine vision are the source of quality information for the CAQ system and can also be integrated with other CIMS systems.

Image acquisition

Image acquisition is essentially the process of converting the visual image and intrinsic features of the object under test into a series of data that can be processed by a computer. It mainly consists of three parts: illumination, image focusing and formation, image determination, and forming the camera output signal.

1. Lighting

Lighting is a crucial factor affecting the input of machine vision systems, as it directly impacts the quality of the input data and accounts for at least 30% of the application's effectiveness. Since there is no universal lighting equipment for machine vision, appropriate lighting devices must be selected for each specific application instance to achieve optimal results.

In the past, many industrial machine vision systems used visible light as their light source, mainly because it was readily available, inexpensive, and easy to operate. Commonly used visible light sources include incandescent lamps, fluorescent lamps, mercury lamps, and sodium lamps. However, a major drawback of these sources is their instability. For example, fluorescent lamps experience a 15% decrease in light energy within the first 100 hours of use, and this decrease continues over time. Therefore, ensuring stable light energy to a certain extent is a pressing issue that needs to be addressed in practical applications.

On the other hand, ambient light will change the total light energy that these light sources illuminate onto objects, causing noise in the output image data. Generally, a protective screen is used to reduce the impact of ambient light.

Due to the aforementioned problems, in modern industrial applications, invisible light such as X-rays and ultrasound are often used as light sources for certain high-requirement inspection tasks. However, invisible light is not conducive to the operation of the inspection system and is also expensive. Therefore, in practical applications, visible light is still more commonly used as the light source.

Illumination systems can be categorized by their illumination methods, including backlighting, front lighting, structured light, and stroboscopic lighting. Backlighting places the object under test between the light source and the camera, offering the advantage of high-contrast images. Front lighting places the light source and camera on the same side of the object, facilitating installation. Structured light illumination projects a grating or line light source onto the object, demodulating its three-dimensional information based on the resulting distortions. Stroboscopic lighting illuminates the object with high-frequency light pulses, requiring the camera to synchronize with the light source.

2. Image focusing formation

The image of the object being measured is focused onto a sensitive element through a lens, much like a camera takes a picture. The difference is that a camera uses film, while a machine vision system uses a sensor to capture the image. The sensor converts the visual image into an electrical signal for computer processing.

The selection of cameras for machine vision systems should be based on the requirements of the actual application, among which the camera's lens parameters are an important indicator. Lens parameters are divided into four parts: magnification, focal length, depth of field, and lens mounting.

3. Image determination and generation of camera output signal

A machine vision system is actually a photoelectric conversion device, which converts the lens image received by the sensor into an electrical signal that can be processed by a computer. The camera can be a vacuum tube or a solid state sensing unit.

Vacuum tube cameras were developed relatively early, and were already used in commercial television in the 1930s. They used vacuum tubes containing photosensitive elements for image sensing, converting the received image into an analog voltage signal for output. Cameras with RS-170 output can be directly connected to commercial television displays.

Solid-state cameras were developed in the late 1960s after Bell Telephone Laboratories invented the charge-coupled device (CCD). They consist of a linear or rectangular array of photodiodes distributed across individual pixels. By outputting voltage pulses to each diode in a specific sequence, they convert the optical signal of the image into an electrical signal. The output voltage pulse sequence can be directly input into a standard television display in RS-170 format or into a computer's memory for numerical processing. CCDs are currently the most commonly used machine vision sensors.

Image processing technology

In machine vision systems, visual information processing techniques primarily rely on image processing methods, including image enhancement, data encoding and transmission, smoothing, edge sharpening, segmentation, feature extraction, and image recognition and understanding. After these processes, the quality of the output image is significantly improved, enhancing both its visual appeal and facilitating computer analysis, processing, and recognition.

1. Image enhancement

Image enhancement is used to adjust the contrast of an image, highlight important details, and improve visual quality. Gray-level histogram modification techniques are commonly used for image enhancement.

An image's grayscale histogram is a statistical chart that represents the distribution of grayscale values in an image and is closely related to contrast.

Typically, a two-dimensional digital image represented in a computer can be represented as a matrix, where the elements are the image grayscale values at corresponding coordinate positions. These are discretized integers, generally ranging from 0, 1, ..., 255. This is primarily because the range of values represented by a single byte in a computer is 0 to 255. Furthermore, the human eye can only distinguish approximately 32 grayscale levels. Therefore, using a single byte to represent grayscale is sufficient.

However, histograms only count the probability of a certain gray level pixel appearing, and do not reflect the two-dimensional coordinates of that pixel in the image. Therefore, different images may have the same histogram. The shape of the gray-level histogram can be used to determine the sharpness and black-and-white contrast of the image.

If the histogram of an image is not ideal, it can be modified appropriately using histogram equalization processing technology. This involves mapping and transforming the pixel gray levels in an image with a known gray level probability distribution to create a new image with a uniform gray level probability distribution, thereby achieving the goal of making the image clearer.

2. Image smoothing

Image smoothing, also known as image denoising, primarily aims to remove image distortions caused by imaging equipment and the environment during the actual imaging process, thereby extracting useful information. As is well known, actual images inevitably encounter external and internal interference during their formation, transmission, reception, and processing. This interference includes factors such as the non-uniformity of sensitivity in photoelectric conversion elements, quantization noise during digitization, transmission errors, and human factors, all of which can degrade the image. Therefore, noise removal and restoration of the original image are crucial aspects of image processing.

3. Image data encoding and transmission

Digital images generate a massive amount of data; a single 512x512 pixel digital image contains 256KB of data. Assuming 25 frames are transmitted per second, the transmission rate would be 52.4 Mbps. High transmission rates mean high investment and increased difficulty in widespread adoption. Therefore, image data compression during transmission is crucial. Data compression is primarily achieved through image data encoding and transformation compression.

Image data encoding generally employs predictive coding, which uses a predictive formula to represent the spatial and sequential variation patterns of image data. If the values of all preceding adjacent pixels are known, the value of that pixel can be predicted using this formula. Predictive coding typically only requires transmitting the initial values of the image data and the prediction error, thus compressing 8 bits/pixel to 2 bits/pixel.

Transform compression methods divide the entire image into small data blocks (each block typically 8x8 or 16x16), then classify, transform, and quantize these blocks to form an adaptive transform compression system. This method can compress the data of an image into a small number of tens of bytes for transmission, and then transform it back at the receiving end.

4. Edge sharpening

Image edge sharpening primarily enhances the contour edges and details in an image, forming complete object boundaries to separate objects from the image or detect regions representing the surface of the same object. It was a fundamental problem in early vision theory and algorithms, and remains a crucial factor in the success or failure of vision technology in its later stages.

5. Image segmentation

Image segmentation divides an image into several parts, each corresponding to the surface of a certain object. During segmentation, the grayscale or texture of each part conforms to a certain uniformity measure. Essentially, it classifies pixels. The classification is based on pixel grayscale values, color, spectral characteristics, spatial characteristics, or texture characteristics, etc. Image segmentation is one of the fundamental methods of image processing technology, and is applied in fields such as chromosome classification, scene understanding systems, and machine vision.

Image segmentation mainly employs two methods: First, gray-level thresholding based on the metric space. This method determines pixel clustering in the spatial domain based on the image's gray-level histogram. However, it only utilizes gray-level features and doesn't leverage other useful information, making the segmentation results highly sensitive to noise. Second, spatial domain region growing segmentation. This method constructs segmentation regions from connected sets of pixels that share similar properties (e.g., gray level, organization, gradient). While this method yields good segmentation results, it suffers from computational complexity and slow processing speed. Other methods, such as edge tracking, focus on preserving edge properties, tracking edges to form closed contours, and segmenting the target. Cone image data structures and label-relaxation iteration methods also utilize pixel spatial distribution relationships to rationally merge neighboring pixels. Knowledge-based segmentation methods, on the other hand, leverage prior information and statistical characteristics of the scene. They first perform initial image segmentation, extracting regional features, then use domain knowledge to derive interpretations of the regions, and finally merge the regions based on these interpretations.

6. Image recognition

Image recognition can actually be viewed as a labeling process, that is, using recognition algorithms to identify pre-segmented objects in a scene and assigning specific labels to these objects. This is a task that machine vision systems must complete.

Image recognition problems can be categorized into three types based on their difficulty, from easiest to hardest. In the first type, pixels in an image represent specific information about an object. For example, a pixel in a remote sensing image might represent the reflectance characteristics of a ground feature at a certain location within a specific spectral band, allowing the identification of the feature's type. In the second type, the object to be recognized is a tangible whole, and two-dimensional image information is sufficient for identification, such as text recognition or the recognition of certain three-dimensional objects with stable, visible surfaces. However, unlike the first type, these problems are not easily represented as feature vectors. During recognition, the object must first be correctly segmented from the image background, and then the attribute map of the object in the established image must be matched with the attribute map in a hypothetical model library. The third type involves deriving a three-dimensional representation of the object from input two-dimensional images, feature maps, 2.5D images, etc. The key challenge here is extracting the implicit three-dimensional information, which remains a hot research topic.

Currently, image recognition methods are mainly divided into decision theory and structural methods. Decision theory methods are based on decision functions, which are used to classify and recognize pattern vectors, and are grounded in time-dependent descriptions (such as statistical textures). Structural methods, on the other hand, decompose objects into patterns or pattern primitives. Different object structures have different primitive strings (or strings). By using given pattern primitives to find the encoding boundaries of unknown objects, strings are obtained, and then the class is determined based on these strings. This is a method that relies on symbolic descriptions of the relationships between the objects being measured.

So, what are the challenges in designing machine vision systems? This article mainly summarizes the following five points:

First: The stability of the lighting:

Industrial vision applications are generally divided into four categories: positioning, measurement, detection, and recognition. Among these, measurement has the highest requirements for lighting stability. Even a 10-20% change in lighting can cause a deviation of 1-2 pixels in the measurement result. This isn't a software issue; it's a direct result of the lighting change, causing a shift in the position of the image's edges. Even the most sophisticated software cannot solve this problem. It requires a system design approach that eliminates interference from ambient light while ensuring the stability of the active illumination source. Of course, increasing the resolution of the hardware camera is also a way to improve accuracy and resist environmental interference. For example, if the previous camera corresponded to an object space size of 10µm per pixel, increasing the resolution would reduce it to 5µm per pixel, effectively doubling the accuracy. However, this naturally increases susceptibility to environmental interference.

Second: Inconsistency in workpiece position

In general measurement projects, whether offline or online, the first step with fully automated equipment is to locate the target object. Each time the target object appears in the field of view, its exact location must be known. Even with mechanical clamps, it's impossible to guarantee the target object will always be in the same position. This is where positioning functionality comes in. If the positioning is inaccurate, the measuring tool's position may also be inaccurate, sometimes leading to significant deviations in the measurement results.

Third: Calibration

Generally, the following calibrations are required for high-precision measurements: 1. Optical distortion calibration (which is usually necessary if you are not using a software lens); 2. Projection distortion calibration, which is to correct the image distortion caused by the error in your installation position; 3. Object-image space calibration, which is to calculate the size of the object space corresponding to each pixel.

However, current calibration algorithms are all based on planar calibration. If the physical object to be measured is not planar, calibration will require some special algorithms to handle it, which the usual calibration algorithms cannot solve.

In addition, some calibrations require special calibration methods because it is inconvenient to use a calibration board. Therefore, calibration cannot necessarily be solved by the existing calibration algorithms in the software.

Fourth: The speed of the object's motion

If the object being measured is not stationary but in motion, then the impact of motion blur on image accuracy must be considered (blurred pixels = object speed * camera exposure time), which is not something that software can solve.

Fifth: The measurement accuracy of the software

In measurement applications, the software's accuracy should be considered at 1/2 to 1/4 of a pixel, preferably 1/2, rather than 1/10 to 1/30 of a pixel as in positioning applications, because the software can extract very few feature points from images in measurement applications.

Analyzing the Five Major Challenges in Machine Vision System Design

Read next

CATDOLL 146CM Liya TPE

CATDOLL 128CM Dolly Silicone Doll

CATDOLL Maruko 88CM TPE Doll

CATDOLL Yana Hybrid Silicone Head