A visual tracking system for humanoid robots based on dual computers
2026-04-06 07:38:27··#1
Abstract: Real-time tracking of moving targets is one of the key technologies in robot vision. A vision tracking system for a humanoid robot was designed. The system employs dual computers, one responsible for visual information processing and the other for controlling the motion units. The two computers communicate via Memolink. A Windows-based visual information processing subsystem performs target segmentation, state estimation, and prediction. The motion control subsystem uses the RTlinux real-time operating system and a PD controller to control joint movements. Experiments verify the system's stability and real-time performance. Keywords: Humanoid robot; Visual tracking; RTLinux Abstract: Tracking a moving object is one of the most important technologies in the robot vision domain. A visual tracking system for a humanoid robot is designed. Two computers are embedded in the robot to ensure real-time tracking of a moving object. The computers are linked through a Memolink communication module. One computer is responsible for rapidly segmenting the moving object from the video, estimating, and predicting the object's states. The other computer is used for the robot's motion control. The information processing subsystem uses Windows as the operating system. The motion control subsystem adopts RTLinux as the platform, and a conventional PD controller is used to control joint motion. Experiments demonstrate the system's effectiveness and robustness. Key words: Humanoid robot; Visual Tracking; RTLinux 0 Introduction The head visual tracking system of a humanoid robot utilizes visual information as feedback to plan the robot's head movements, enabling it to track moving targets in real time. Visual tracking is one of the important functions of humanoid robots. Its research is of great significance for the autonomous navigation, human-computer interaction and visual servoing of humanoid robots. The real-time performance of visual tracking is one of the important performance requirements of humanoid robots. In response to this system requirement, many scholars have designed a variety of system structures in recent years. The author of [1] designed a distributed humanoid robot control system based on CAN bus, in which the vision system communicates with the control system through a wireless local area network. The motion control system of the Japanese humanoid robot ASIMO adopts a centralized control method, and the vision system communicates with the motion control system through the network [2]. A single computer is difficult to meet the real-time requirements of visual tracking. In order to achieve real-time tracking, this paper proposes and implements a dual-computer visual tracking system based on MemoLink communication. The system has reliable communication and small size, and it is easy to place two computers in the chest cavity of the humanoid robot. The stability of target segmentation is one of the important requirements of robot visual tracking system. In recent years, many scholars have studied this field. Most robot target tracking systems use single image information, some use the color information of the object [3], and some use the contour information of the object [4]. However, in complex unstructured indoor backgrounds, single image information cannot guarantee stable target segmentation. The fusion of multiple image information is one of the methods to solve the stability of target object recognition [5]. In this paper, the authors propose a fast target segmentation method that integrates depth, color and shape information to gradually approach the target region. 1 System Structure The system structure of the humanoid robot BHR1 is shown in Figure 1. It has 32 degrees of freedom, of which the head has 2 degrees of freedom and can rotate freely in two directions, namely left and right rotation and up and down rotation. Two CCD cameras are placed on the face as visual sensors to simulate human eyes. The SVS stereo vision processing system is used to process visual information. The SVS system provides depth images for each frame [6]. [align=center] Figure 1 System structure of the tracking system of humanoid robot (BHR1) [/align] Two computers are placed in the chest cavity of the robot. One computer is responsible for processing visual information, and the other is responsible for the motion control of the robot. The former is called the information processing subsystem, and the latter is called the motion control subsystem. The two computers communicate through Memolink. The information processing subsystem utilizes the powerful multimedia capabilities of Windows to process stereoscopic vision information, achieving rapid target segmentation and motion estimation and prediction. The motion control subsystem uses the Linux/RT-Linux real-time operating system as its platform, ensuring the real-time performance of the robot control system. Except for the head joints, the motion control system is responsible for controlling all joints of the humanoid robot. Memolink is an effective solution for rapid inter-system communication, serving as a bridge between the information processing subsystem and the motion control subsystem. It offers advantages such as high communication speed and no handshake required before communication. The entire tracking process executes the following loop: target search – target discovery – matching – state estimation and prediction – motion control. Different matching methods result in different tracking methods. This paper proposes a rapid segmentation method that integrates depth, color, and shape information to progressively approximate the target region. In the real-time tracking system, motion estimation and prediction effectively reduce the detection area and improve the system's tracking speed. The classic Kalman filter is used in this study for state estimation and prediction of moving targets. 2. Target Segmentation Method Based on Multi-Image Information The visual information processing subsystem performs rapid segmentation of the target object, simultaneously estimating and predicting the target object's motion information, and transmitting the target object's position information to the motion control subsystem in real time. The stability of target recognition plays a crucial role in the stability of the entire tracking system. In unstructured indoor environments with complex backgrounds, the image information used for robot vision tracking includes depth, color, shape, edges, and motion. In multi-information-based moving target segmentation methods, the selected information should be complementary. The color of an object is its most prominent feature and is suitable for target tracking. However, color-based tracking will fail when the background contains objects of the same color. Depth information helps the system obtain a rough foreground region, i.e., a target candidate window containing moving objects. In addition, obtaining the rough foreground contour based on depth segmentation is computationally inefficient and fast. A shape detector based on the RHT (Random Hough transform) algorithm can detect various geometric shapes, such as ellipses, triangles, and polygons, thereby distinguishing objects of the same color in the target candidate region. [align=center] Figure 2: Moving target segmentation process in a video sequence Figure 3: Segmentation results of target objects in complex scenes[/align] This paper utilizes a humanoid robot's stereo vision system to design a fast tracking method that integrates depth, color, and shape information to progressively approximate the target region. Figure 2 shows the moving target segmentation process in a video sequence. First, depth information is used to segment the foreground region of interest to the robot, resulting in the Region of Foreground (ROF), a coarse candidate target region. A color filter is then used within the ROF to segment the region into the Region of Interest Color (ROIC). Finally, a shape detector distinguishes objects of the same color. During segmentation, the candidate target region gradually shrinks and approaches the target region. This gradual shrinking of the candidate target region reduces computational load and improves the system's processing speed. Simultaneously, this method effectively avoids interference from objects of the same color in the scene, improving the stability of target segmentation. Figure 3 shows the segmentation results of the target object. 3. Motion Control Subsystem 3.1 Structure of the Motion Control System The robot's motion control subsystem is a typical computer control system. The purpose of controlling the robot's head is to enable it to track the moving target in real time; therefore, the actual control signal input is planned data obtained based on the target object's position information. Regarding the feedback signal input, since the controlled object is the angle of motor rotation, the output of the shaft angle encoder on the motor is used as the feedback signal. The system uses a multi-functional interface board, integrating all A/D conversion, D/A conversion, ENC (encoder), PWM, IO, and other functions, improving system integration and reducing system size and weight. The ENC interface on the multi-functional interface board serves as the input channel for feedback signals, measuring the number of pulses output by the shaft angle encoder. Each motion joint uses classic PD servo control. 3.2 Software Structure of the Motion Control System The motion control subsystem uses the RT-Linux (Real-Time Linux) real-time operating system. Its software structure is shown in Figure 4, mainly including two modules: the main program module and the real-time task module. The main program module is a Linux application, and the real-time task module is a real-time process under RTLinux. The two modules are also two processes, communicating through pipes (FIFO). [align=center] Figure 4 Software Structure of the BHR1 Motion Control System[/align] The real-time task main module includes two parts: a periodically executed real-time control loop (i.e., a real-time thread) and a real-time task trigger. The periodic execution of the real-time thread is implemented by a loop. This loop primarily performs two functions: robot motion control and information acquisition from the shaft angle encoders connected to each motor. The real-time task cycle is 3 milliseconds. The real-time task cycle is determined based on the D/A channel processing time, the code disk counter reading time, and the sensor information acquisition time. The main program module is no different from a typical Linux application; it mainly has the following functions: communicating with the information processing system; transmitting control parameters to the real-time task; and implementing human-machine interaction, i.e., outputting motor rotation data and sensor data transmitted from the real-time task to the monitor, while simultaneously displaying control signals input via the keyboard. In fact, the main program module mainly functions as a console and can be called a console program. 3.3 The control objective of the motion control process tracking system is: based on the position of the target's centroid in the image plane obtained from image processing, adjust the rotation angles of the two motors on the robot's head in real time to place the target in the center of the image plane. One control loop in the motion control system takes approximately 3 milliseconds. In the information processing system, processing one frame of image takes an average of about 100 milliseconds. Therefore, the cycle of vision processing is much longer than the cycle of motion control. Therefore, after one visual processing cycle, the system should plan the motion for the next visual processing cycle, that is, plan the motion for the subsequent multiple control cycles. This ensures that the robot's head tracks the target at a uniform, smooth, and accurate speed. 4. Experiments In the humanoid robot BHR1, the information processing computer has a PIV 2.4GHz CPU and 512MB of memory, while the motion control computer has a PIII 700MHz CPU and 256MB of memory. The SVS system's acquisition speed is 15 frames per second, and the acquired image size is 320×240 pixels. Memolink uses a PCI interface with a maximum transmission rate of 1M bytes/s or 1M words/s. 4.1 Tracking of Moving Targets in Complex Backgrounds. In the moving object tracking experiment, a red ball is used as the target, performing a pendulum motion in the robot's field of vision. To verify the target recognition algorithm based on multi-image information, a red square and a green ball are placed in the background. The experimental results are shown in Figure 5. The first row of images is the experimental scene, and the second row is the video sequence from the left camera. The results show that when the speed of the colored target is less than 0.3 m/s, the robot head can still track the target's movement well and keep it in the center of the image captured by the left camera. In complex unstructured indoor environments, the system is likely to fail to track using only a single image. Under the same background, single color information cannot distinguish the red ball from the red square in the background. [align=center] Figure 5 Color target tracking experiment under complex background Figure 6 Tracking error (pixels) of the target in the X and Y directions[/align] Figure 6 shows the tracking process when the red ball is in motion. The data in the figure is 1/10 of the actual data sample. It can be seen that in the X-axis direction, the deviation of the target's centroid coordinates from the image center is within ±30 pixels, and in the Y-axis direction, the deviation of the target's centroid coordinates from the image center is within ±20 pixels. The experiment shows that during the object's movement, the tracking system can track the object in real time and keep the object's centroid at the center of the left eye camera. 5. Conclusion This paper proposes a dual-computer visual tracking system for humanoid robots based on Memolink communication. The system meets the performance requirements for real-time visual tracking of humanoid robots. In unknown and complex environments, a multi-image information fusion scheme based on depth, color, and template matching ensures that the robot stably segments moving targets from video sequences. The authors' innovations : This paper proposes and implements a dual-computer visual tracking system based on MemoLink communication. One computer is responsible for processing video information, while the other computer is responsible for the motion control of the robot's head, achieving real-time tracking of moving targets by the humanoid robot's head. This paper also proposes a fast target segmentation method that integrates depth, color, and shape information to progressively approximate the target region, achieving stable segmentation of target objects in complex backgrounds. References: [1] Zhong Hua, Wu Zhenwei, Bu Chunhua. Research and implementation of humanoid robot control system [J]. Robot. 2005, Vol 27, No. 5: 455-459. [2] S.Yoshiaki, W.Ryujn, A.Chiaki, The intelligent ASIMO: system overview and integration [A], IEEE/RJS Conference on Intelligence robot and system [C]. Switzerland: IEEE 2002, 2479-2483. [3] Chen Kaifeng, Xiao Nanfeng. Research on face detection, tracking and recognition of home service robots [J]. Microcomputer Information. 2006, Vol 22, No.5-2:228-230 [4] M. Pardas, E. Sayrol. A new approach to tracking with active contours [A]. International Conference on Image Processing [C]. Canada: 2000, vol.2, 259–262. [5] Y. Aloimonos, D. Shulman, Integration of Visual Modules [M], Boston: Academic Press, 1989. [6] Kurt Konolige. Small vision systems: hardware and implementation [A]. Eighth International Symposium on Robotics Research [C]. London: 1997 111-116.