Visual servo control system for humanoid robots

Vision is a crucial means for humans to perceive external information, and visual servoing systems are a key component for robots to acquire environmental information. This paper mainly discusses the visual servoing system of the humanoid robot BHR-1. First, the overall visual structure of the robot's head is introduced. Then, information processing and head motion control based on stereo vision are discussed. Finally, the feasibility of the system is demonstrated through target tracking and object grasping experiments. Overall Scheme and Control System The visual servoing system of a humanoid robot is required to actively search according to specific environments and situations, turning the camera towards the target in real time to achieve real-time tracking of spatial targets and acquire the object's three-dimensional position information, thereby controlling the arm to accurately grasp the object. The head of BHR-1 has two degrees of freedom, with two CCD cameras placed on the face as visual sensors to simulate human eyes. The robot's arms are also designed to mimic human upper limbs, with seven degrees of freedom: three for the shoulder joint, two for the elbow joint, and two for the wrist joint, enabling various movements of the human upper limb. The robot grasps the object based on its three-dimensional position information. The overall design scheme of the humanoid robot BHR-1 is shown in Figure 2. To achieve rapid object localization, image processing and motion control are required. A single computer cannot meet the real-time requirements of rapid localization; therefore, this paper adopts a system architecture with dual-computer processing and Memolink communication. Two computers are used to handle binocular stereo vision information processing and robot motion control, respectively. Memolink is an effective solution for rapid inter-system communication. The robot's visual tracking and target grasping both rely on the motion control computer to control the robot's motion. The motion control system makes corresponding decisions based on the processing results of the vision processing system. For example, the head may rotate with two degrees of freedom to track the target's movement, or the upper limb may grasp the target. The motion control subsystem uses the RT-Linux real-time operating system as its software platform, ensuring the real-time performance of the robot control system. The controlled object of the robot's motion control subsystem is the angle of each joint of the robot. Since the joints are driven by motors, the controlled object is actually the angle of rotation of the motors that drive the joints, which constitutes a position servo system. The system utilizes a multi-functional interface board, integrating all functions such as A/D conversion, D/A conversion, ENC, PWM, and 32-bit I/O, improving system integration and reducing system size and weight. Regarding control signal input, since the goal of control is for the robot's head to track the moving target, the input is actually obtained from the image processing subsystem's processing results. The final target position obtained during image processing becomes the input to the subsequent motion control subsystem. Because the image processing subsystem's result is itself a digital quantity, and the position information obtained by the motion control subsystem is also a digital signal, no analog-to-digital conversion is needed. For feedback signal input, since the controlled object is a motor, specifically the motor's rotation angle (position control), the output of the motor's shaft angle encoder can be used as the feedback signal. The shaft angle encoder measures the angle rotated by the motor, providing feedback through pulses. The larger the angle rotated, the more pulses it outputs, and vice versa. Therefore, we adopted the ENC (encoder) interface on the interface board as the input channel for the feedback signal, which can measure the number of pulse outputs of the shaft angle encoder. The structural block diagram of the robot head motion control subsystem is shown in Figure 3. Visual Information Processing Based on Stereo Vision The system adopts a stereo vision-based solution, which incorporates depth information to make the target search results more accurate. The system uses a high-speed binocular stereo vision system (SVS) developed by the SRI Artificial Intelligence Center in the United States. Image segmentation is the preprocessing stage for object recognition and is one of the key technologies of robot vision servo systems. This system adopts a threshold segmentation method based on color information. Theoretical analysis and experimental results show that for objects with the same color attribute, the measured RGB color values are very dispersed under different conditions such as light source type, illuminance, and object reflection characteristics, making it difficult to determine the threshold range for RGB recognition. The HSV model is closer to human color perception. It quantifies the collected color information into three attributes: hue, saturation, and brightness. The hue attribute H can accurately reflect the color type and has low sensitivity to changes in external lighting conditions. Therefore, HSV is more suitable than RGB for use as the basis for recognition processing. This paper uses the HSV model as the basis for color recognition processing, selecting parameters H and V as the criteria for recognition. The specific transformation relationship from a point in RGB space to a point in HSV space is as follows: V=max(r,g,b), V′=min(r,g,b); If V=0 or V = V′ then H=0, S=0; If r = V then H=(g－b)/(V－V′); If g=V then H=2+(b－r)/(V－V′); If b = V then H=4+(r－g)/(V－V′), H=H×60; If H <0 then H=H+360, S=(V－V′)/V. The system first samples the target image region offline, converts the local color image from the RGB model to the HSV model, and plots histograms for the H and S components to obtain the H and S thresholds for the selected region. This is an offline learning process. In subsequent real-time image recognition, the H and S thresholds are updated in real time based on the color image from the previous visual cycle to adapt to new lighting conditions. The flowchart of the vision processing system is shown in Figure 4. The system uses a camera to acquire images. After a series of preprocessing steps, the images are segmented into multiple regions. These regions are then searched to locate the target region based on known target features. If a target is found, the robot's head is oriented towards it, and the target's features are updated for use in the next search. If no matching target is found, the target may be temporarily hidden or lost, and the next processing cycle begins, waiting for the target to reappear. Because the vision processing system processes the image from the previous cycle, the obtained target coordinates are also from the previous cycle. If these coordinates are used as the input for motion control, the head movement will always lag behind by one cycle. To accelerate the system, proportional-derivative (PD) control is employed. The system's input-output functions are: Iα(k+1) = kp eα(k) + kd(eα(k) - eα(k-1)) Iβ(k+1) = kp eβ(k) + kd(eβ(k) - eβ(k-1)) eα(k) = αk - αk', eβ(k) = βk - βk' Where Iα(k+1) and Iβ(k+1) are the outputs of the control system at time t(k+1); (αk, βk) represent the direction coordinates of the target at time t(k); (αk', βk') are the direction coordinates of the two-degree-of-freedom mechanism at that time; eα(k) and eβ(k) represent the deviations between the head position and the target position at that time; kp and kd are the proportional and derivative coefficients of the control system, respectively. By experimentally adjusting kp and kd, where kd << kp, the system can achieve both high response speed and stability. Motion Control Process As described above, the computer control system operates through a cyclical process of real-time data acquisition, real-time decision-making, and real-time control. In this system, based on the specific components used, it is assumed that completing such a cycle for all control loops takes approximately m milliseconds. In a visual information processing system, processing one frame of image takes an average of approximately n milliseconds. Due to the different characteristics of visual processing and motion control tasks, n >> m, meaning the visual processing cycle is much longer than the motion control cycle. Within one visual processing cycle, the system can complete multiple control cycles. Therefore, after one visual processing cycle, the system should prepare motion planning for the next visual processing cycle, i.e., prepare motion planning for the subsequent multiple control cycles. This ensures that the robot's head tracks the target at a uniform, smooth, and accurate speed. The control system software flowchart is shown in Figure 5. Within each motion control cycle, the program first checks Memolink to see if the visual information processing system has any new processing results transmitted to the motion control system via Memolink. If not, the program controls the robot's movement according to the preset motion plan; if so, the program modifies the motion plan based on the visual system's processing results. To ensure smooth robot head movement, we set the planned time for each preset motion plan to be slightly longer than the average cycle of visual processing. This ensures that the original motion plan has not yet been completed when a new visual processing result arrives. Therefore, as long as the target is constantly moving, the robot head can remain in continuous motion, avoiding intermittent rotation and stopping. The program then reads the planned motion and feedback, calculates the control quantity based on the difference between the two, and sends a control signal to control the rotation of the robot head. The program uses a traditional PID algorithm to calculate the control quantity. Let t(k) be the time of the kth motion control cycle. At time t(k), the system output is Yk, and the planned motion quantity is Xk. According to the PID algorithm, at time t(k+1), the system output Yk+1 is Yk+1 = KP(Xk-Yk) + Ki∑(Xk-Yk) + Kd(Xk-Yk-Xk-1+Yk-1), where KP, Ki, and Kd are the proportional coefficient, integral coefficient, and derivative coefficient, respectively. In a control system, a certain integral coefficient can eliminate residual error but reduce response speed; while a certain proportional coefficient can accelerate response and anticipate changes in input, but may lead to instability. Therefore, when the results are acceptable, only the proportional coefficient should be used; integral and derivative coefficients should only be used if the results are unsatisfactory. Experiment In this system, the visual information processing system and motion control system use Windows and RT-Linux as software development platforms, respectively. RT-Linux is a real-time operating system, which meets the real-time requirements of motion control, while Windows' powerful multimedia capabilities make it a platform for image processing. The visual information processing computer has a PIV 2.4GB CPU and 512MB of RAM; the motion control computer has a PIII 700MHz CPU and 256MB of RAM. Memolink serves as the bridge between the visual processing system and the motion control system; the selected product uses a PCI interface with a maximum transmission rate of 1Mbytes/s. The camera is an SVS visual processing system, sampling 15 frames per second. The SVS vision processing system is mounted on a 2-DOF motion mechanism. This mechanism's movement in two degrees of freedom is sufficient to point in any direction, thus enabling object tracking. The BHR-1's head has three-dimensional dimensions: width 19cm, height 27cm, depth 19cm, and weight 2.8kg. These dimensions include the mechanical structure, bearings, motors, and cameras. When using this system to track and locate objects, the image processing speed is 10 frames per second, the visual servo cycle is approximately 100ms, and the motion control system's servo cycle is 3ms. Near-range positioning accuracy is high, with a maximum accuracy of 3‰ at 1m. To further verify the effectiveness of the proposed visual positioning and motion planning method, the BHR-1 system implemented an object grasping experiment. The robot arm is the right arm of a 7-DOF robot. During the experiment, the vision system transmits the 3D information of the target object to the motion control computer via MemoLink. The motion control computer plans the data according to the proposed method and grasps the object. Conclusion This paper presents a binocular vision-based object tracking and positioning scheme. Binocular vision is used to acquire three-dimensional spatial information of a target object, enabling object localization. This system employs dual-computer processing and Memolink communication, with two computers handling visual information processing and motion control respectively, ensuring a high system response speed.

Visual servo control system for humanoid robots

Read next

CATDOLL 138CM Ya TPE (Customer Photos)

CATDOLL 123CM LuisaTPE

CATDOLL Dolly Hybrid Silicone Head

CATDOLL Cici Soft Silicone Head