Research/Design of Automatic Control Platform Based on Remote Robot System
2026-04-06 06:38:13··#1
Abstract: This paper discusses the research and design of an automatic control platform for remote robots. The original control system is briefly introduced, and the new automatic control platform is described from the aspects of design concept, a dedicated data structure, and operation process. Keywords: Remote robot, Automatic control, Execution script 1 Overview of Automatic Control Platform In today's rapidly developing network and multimedia technologies, controlling robots via remote robot images transmitted over the network has become a feasible technology. Using this technology, the activities of remote robots can be detected to obtain real-time information, which can then be used to issue commands to the remote robot. However, if a person directly controls the robot remotely, they need to monitor the robot's behavior for a long time through the transmitted images and continuously send the same commands to the remote robot. Clearly, intelligent automatic control by a computer is more suitable. The automatic control platform for remote robots was developed based on this consideration. The original real-time image transmission and real-time control system for remote robots developed by our research group consists of two parts: a remote robot station and a control station. The remote robot station is responsible for image information acquisition, compression, and transmission, while also receiving and executing commands from the control station. The control station decompresses and displays the images, and sends commands back to the remote robot station. The automatic control platform is based on the original system, adding a component to the control station's application to judge the images transmitted from the remote end and automatically send commands based on the judgment. 2 Related Technologies 2.1 Image Acquisition and Display The remote robot station needs to acquire images on-site; while at the control station, the robot's images need to be displayed. In this system's application, acquisition and display are achieved through calls to VFW. VFW (Video for Windows SDK) is a system call interface called AVICap window class functions, introduced by Microsoft in 1992. Through VFW calls, video clips can be acquired or played by sending information or setting properties. For example, in the program, calling capCreateCaptureWindow can create a video acquisition window, and calling capSetCallbackonFrame can set the callback function when a frame of image is acquired. In the callback function, further display or other image processing can be performed. VFW is only available in VC and VB versions, while this system is developed using Delphi to leverage Delphi's excellent interface features and multi-threading mechanism. Therefore, vfw.h needs to be rewritten using Pascal to obtain the vfw.pas file, which can be directly called in Delphi. Although VFW is used during programming, since the program runs on Windows 2000, the actual driver is still WDM (Windows Driver Model). WDM, also developed by Microsoft, has significant advantages over VFW in video conferencing and PC/TV applications. 2.2 Image Compression This system employs various image compression algorithms to adapt to different network transmission environments and real-time requirements. These include MPEG4, which offers high display quality but has high overhead, and H.26x, which has low overhead and is suitable for long-distance transmission but has a low network transmission rate. In addition, a Tsinghua H.263 compression algorithm developed by our research group is also used. The appropriate algorithm can be selected based on network conditions. If the network is good, MPEG4 with high display quality is used; if the network is poor, H.263 with low overhead is used. 2.3 Adaptive Network Transmission Besides the flexible selection of compression algorithms, our research group designed two adaptive methods to adjust transmission to better adapt to network conditions. One is frame rate adaptation, which adjusts the video transmission rate at the robot end to adapt to image quality; the other is communication bandwidth adaptation, which adjusts the amount of data transmitted per unit time by adjusting the quality of keyframes and non-keyframes, thus adapting to network conditions. 2.4 Multi-process and Multi-threaded Use This system can control multiple robot stations simultaneously. Whenever a new remote robot station is connected, the control station automatically generates a new process to control that robot station. Within the process controlling a robot station, several threads simultaneously implement different functions. The ChatThread thread handles text communication between the robot station and the control station; the SyncThread thread ensures synchronization of transmission and reception between the two stations; and the largest and most important DrawThread thread is responsible for image reception, decompression, and display, while also implementing automatic control functions. 2.5 Automatic Control As mentioned earlier, automatic control includes two aspects: first, analyzing the transmitted image information to obtain the robot's real-time status. Second, commands are issued based on the robot's current state. Image information is analyzed by comparing it with standard images to determine whether the robot has entered a certain state. If the robot is found to be in a state corresponding to a standard image, the control station will issue a command corresponding to that state. Since robots currently cannot have very flexible and delicate movements, it is only necessary to compare images of a few key parts (such as the head and arms) to determine the robot's current state. This provides favorable conditions for the storage and recognition of standard images and also provides a premise for the design and implementation of the automatic control platform. 3 Design and Implementation of Automatic Control Platform 3.1 Design Ideas A robot's motion process can generally be decomposed into several key states. When it is in a certain state, it needs to be given a certain instruction to transition to the next state. Therefore, the image information of each key state required to complete the motion process can be stored in a file, along with the corresponding instructions. When the motion process is to be executed, the application reads the file to obtain the information of each state, and then begins to compare the real-time image information with the key state image information. If they match, the corresponding command is issued. In fact, this file is equivalent to a running script. Such execution scripts can be created during manual control. The next time the same operation needs to be performed, this script only needs to be read in to achieve automatic control. 3.2 Data Structure In the program, a data structure called `scformat` is established to describe and store the execution scripts. The basic structure of `scformat` is shown in Figure 1. One `scformat` corresponds to one complete script file. Because the data file is organized using a linked list, it can contain any number of standard images and their corresponding commands. `scformat` contains data blocks such as color depth (1 represents 8-bit color, and so on), number of images (`framenum`), image width (`framewidth`), and image height (`frameheight`). `datahead` and `datatail` are the head and tail pointers of the data linked list. One `data` corresponds to one image. In addition to basic information such as `fwidth`, `fheight`, and `depth`, `order` is a character array used to store the corresponding commands, `segnum` represents the most important key parts of the image reflecting the robot's basic motion, and `pnext` points to the next `data` in the data linked list. For each segment extracted from a data image, a linked list structure is also used for organization, with seghead and segtail pointing to the head and tail of the list, respectively. Each seg corresponds to a key segment extracted from the image, where width and height represent the width and height of the segment, and topx and topy are the coordinates of the lower left corner of the segment within the image, conforming to the rule of storing image data starting from the lower left corner. segdata points to the specific image data for that segment. Figure 2 illustrates the specific relationship between these structures on the image. The advantage of this data structure is its comprehensive flexibility. Script files corresponding to different motion processes may require different numbers of standard images, and images representing different robot states may require different numbers of key segments to reflect the robot's motion. Due to the flexibility of this data structure, both the standard image data (data) and the key segments (seg) within the image are organized using a linked list structure, thus adapting to various situations. Furthermore, various functions and procedures are provided for the scformat, data, and seg data structures, making image extraction, script file storage, script file reading, and display quite convenient. 3.3 Key Process Points The specific process is divided into the following two aspects: First, obtaining the control script, which is implemented manually. Based on the data structure described above, this process can be completed accurately. In this process, an `scformat` structure is first created. Since the data within it is organized according to a linked list structure, standard images and corresponding commands can be added at any time. Simultaneously, for a specific image, several key parts of the image can be selected. After this process is completed, `scformat.writescfile` can be called to store the obtained script file on the hard disk. Second, the automatic control process. During automatic control, the file is first read from the hard disk, and `readscfile` is called to add the data from the file to an `scformat` structure. Then, automatic control is performed according to the data in this `scformat` script file. Because the system uses compression technology, after the control station obtains the compressed package, it is decompressed to obtain real-time images. Then, the real-time images are compared one by one with the images in the running script. If the comparison matches, the comparison stops, and the corresponding commands are sent to the remote robot; if a matching result is obtained, it indicates that the robot is not currently in any specific state, so the loop of detection and comparison continues. 4. Platform Operating Environment This system underwent thorough testing and modifications in the laboratory. The hardware and software environment is as follows: Hardware Environment: Intel Pentium 4 processor, nVidia TNT2 M64 graphics card, Intel ProShare Personal Conferencing camera, Leadtek WinFast TV2000 video capture card, 10M/100M adaptive network card, 100M Ethernet. Software Environment: Microsoft Windows 2000 Advanced Server, Borland Delphi 6.0 (With Update pack 2), Microsoft Visual C++ 6.0 (Service Pack 5) . 5. Actual Testing This system underwent thorough testing in the laboratory. Under normal circumstances, for an image reflecting a certain motion state, the system can extract 0-5 key parts by default to reflect the characteristics of that state. For a running script, if it's an ordered, simple script (meaning the images in the script are arranged according to the order of operations, and the system only needs to compare the obtained real-time image with the image that should be currently in the current state), the script can contain any number of images reflecting key states (provided storage space is sufficient). If it's an unordered script, the system needs to compare the acquired images with all the images in the script to obtain the instruction to be executed. In testing, even with ten images in the script, the comparison results were still obtained quickly. The tests show that the key design principles of this system are all usable.