This paper combines camera image acquisition with VC++ 6.0 as the development tool to achieve automatic recognition of human-computer chess games. First, it introduces the characteristics of machine vision and the differences between computer-based and embedded systems. Then, it describes the system structure and its execution process. Finally, a test program for Gomoku (Five in a Row) is built and tested. Through program testing, the system can accurately identify the human's moves on the chessboard and display the coordinates of the computer's moves on the screen. After integrating the execution module, the system can play against a human on-site.
1 Introduction
Humans perceive the external world primarily through sensory organs such as vision, touch, hearing, and smell, with approximately 83% of information acquired visually. Machine vision utilizes photoelectric imaging systems to acquire images of controlled targets, which are then digitally processed by computers or dedicated image processing modules to identify features such as the target's size, shape, and color, and to acquire information about the surrounding environment to guide robot movement. Machine vision is a comprehensive discipline integrating computer hardware and software, digital image processing, mechanics, optics, and analog and digital circuit technologies. In the field of robotics, computer vision provides robots with the function of eyes. In short, machine vision uses machines to replace human eyes for measurement and judgment. Machine vision is a crucial direction in robotics research, holding a vital position in robot research and applications, and playing a decisive role in the intelligence of robots.
2. Analysis of the Approach
2.1 Feasibility
Vision systems can be broadly categorized into two types based on their operating environment: computer-based and embedded systems. Computer-based systems leverage their openness, high programming flexibility, and user-friendly interface, while also offering lower overall system costs and operating within the Windows environment. Users can use them to quickly develop complex and advanced applications.
In embedded systems, the system software is embedded in the image processor. Configuration of menus displayed on a monitor is achieved through a simple device similar to a game keyboard, or by developing and downloading software onto a computer. While these systems exhibit high reliability, integration, miniaturization, high speed, and low cost, they also reveal disadvantages such as poor intelligence and limited adaptability. To enable robots to adapt to complex and dynamic environments and possess learning, inductive, and analytical capabilities, we often apply machine learning methods such as neural networks, decision trees, and genetic algorithms to machine vision to achieve image segmentation and object recognition under complex conditions. Therefore, in many cases, the system needs to combine handling cumbersome programming with flexible debugging capabilities. Thus, the test system employs a computer-based machine vision system for object recognition.
Since the 1980s, machine vision research has progressed from laboratory testing to practical applications. Scholars both domestically and internationally have conducted substantial research and practice, solving many technical and application-level problems and paving the way for further research. Significant progress has been made in the study of machine vision mechanisms, models, and algorithms, from simple binary image processing to high-resolution multi-grayscale image processing, and from general two-dimensional information processing to three-dimensional vision. Furthermore, the rapid advancements in computer technology and the corresponding developments in artificial intelligence, parallel processing, and neural networks have further facilitated the practical application of machine vision systems and the study of many complex visual processes.
Figure 1: Deep Blue defeats Kasparov. Figure 2: System diagram. Figure 3: Recognition of piece coordinates.2.2 Conceptual Approach
Figure 1 shows IBM's Deep Blue defeating Grandmaster Kasparov in a chess match in May 1997 (the person on the right is the operator of Deep Blue). Unfortunately, Deep Blue required human assistance to assess the situation, determine piece positions, and move pieces. It lacked visual perception and behavioral agency. Despite its super-fast computational capabilities (weighing over one ton), Deep Blue's abilities were limited to reasoning; it couldn't perform all the tasks Kasparov underwent, or even the actions of a child.
For a computer to have the ability to assess a situation, it should have peripherals similar to the human eye. Fortunately, computer cameras combined with software can already accomplish this task. By performing a series of image subtraction operations and histogramizing the results, the coordinates of the placed piece can be obtained, as shown in Figure 3. Here, 3a is the image acquired at time N, 3b is the image acquired at time N+1, and the horizontal and vertical histograms of the subtracted images are 3c and 3d, respectively.
Therefore, the idea behind this system is based on VC programming applications on a PC. It uses a camera to capture images of the human-computer chess game, and then uses image processing technology to identify the coordinates of the player's moves in the captured images. The computer analyzes these moves, finds a strategy in the strategy library, and thus completes the robot's vision system, truly realizing real-time human-computer chess gameplay. A schematic diagram is shown in Figure 2.
3 System Composition
Based on the above approach, the system is divided into three main parts: 1) image acquisition, 2) recognition, and 3) strategy. The camera is responsible for image acquisition, while the recognition and chess-playing strategies are implemented through software programming.
3.1 Data Acquisition Section
A webcam is like the human eye, capturing pre-processed images and then transmitting them to a computer. Considering factors such as image quality, price, and portability, a commonly used webcam is chosen.
Most webcams on the market today can achieve a resolution of 800×600 or higher, and are reasonably priced. They also widely use USB interfaces, making them highly portable and widely used. Testing has shown that the system can run correctly at resolutions of 176×144 and above. Considering the trade-off between operating speed and image quality, and the configurations of many users' machines (older machines such as P3 and below, older webcams), adjustable input image settings have been added to the program.
The camera lens faces downwards and is placed above the chessboard, similar to the setup of a camera in a robot soccer game. However, the horizontal and vertical orientation of the placement does not need to be very precise; a deviation of a few degrees is sufficient to achieve the desired function. No professional tools are required, and ordinary personnel can easily debug it, making it more practical and portable.
3.2 Identification Section
The image input into the computer is converted into a pixel matrix. First, the chessboard area and grid coordinates are identified. Then, after several point operations, the coordinates of the chess piece that the player moves are calculated.
To identify the chessboard, edge calculations are performed on the initial chessboard image. In this paper, the Laplacian operator is used. Then, the calculated pixel matrix is binarized, and its histograms are calculated in both the horizontal and vertical directions to obtain the chessboard area and grid coordinates. As shown in Figure 4, 4a is the input chessboard image, 4b is the horizontal histogram of this image, and 4c is the vertical histogram of the same image.
To identify the placement of a piece, the difference between two frames within a detection period is calculated, as shown in Figure 5 (difference between image N+1 and time N). The placement coordinates are then calculated using a histogram. However, human body parts and other interference must be excluded. Let B(x) be the relative binarization process of image x (relative to mean binarization), and P(x, y) be the gray values of points x and y. Then:
Furthermore, w1 and w2 represent the left and right boundaries of the chessboard area, h1 and h2 represent the upper and lower boundaries of the chessboard area, Smin to Smax represent the size range of the chess pieces, Nmin to Nmax represent the interference range such as noise and human body shadows, and Δ represents the number of calculated pixels, that is, the situation within this range belongs to the situation where a piece has been placed.
Figure 4 Chessboard Recognition Figure 5 Program Flowchart Figure 6 Screenshot of Program Execution Table 1 Test Parameter Table S Table 2 Test Parameter Table N3.3 Strategy Section
Knowing the opponent's move position is not enough to meet the system requirements. As shown in Figure 5, the computer also needs to determine how to respond, so a strategy library needs to be added.
In the design of the strategy pool, the computer must know which winning combinations exist. Therefore, it must calculate the total number of winning combinations, thus creating an array in the game to determine the winner. Each move the strategy pool makes requires calculating the possible ways to win, i.e., how many winning possibilities there are, determining the probability of winning with each move on the board, and then making offensive and defensive responses.
Because this system program was only designed to test the feasibility of image acquisition and processing, requiring only basic computer responses and not incorporating complex artificial intelligence decision-making processes such as interactive learning or multi-level learning, final program testing proved that the computer adversary still possessed considerable combat capabilities.
4. Program Testing
Before the overall testing of the program, the unknown parameters must be deduced through experimentation. First, the range of N min to N max is tested. Under conditions of minimal external disturbance, two random time intervals are tested, assuming a time interval of 1 second, i.e., comparing the Nth second and the (N+1)th second. The results are shown in Table 2. Then, the range of S min to S max is tested. After placing a certain number of pieces on the chessboard, a time-stamped image (including random disturbances) is taken and compared with the initial chessboard image. The mean of N min to N max is subtracted (i.e., the disturbance is removed), and then divided by the number of pieces placed. The results are shown in Table 1. Of course, we could measure each piece individually, but this would require 113 trials.
The obtained parameters of S or N are only absolute parameters for the image resolution of 320×240 during the test, but they need to be relative when used in multiple resolution modes. Let X' be the desired parameter value at resolution R, and X be the value at 320×240, then X'/X=R/320×240.
After testing, the program can correctly identify chessboards and chess pieces of various materials and make corresponding strategies, meeting the needs of practical applications, as shown in the program screenshot in Figure 6 (the upper frame is the display window, and the lower frame is the acquisition window. In this test, a paper chessboard and paper chess pieces were drawn by hand, and the human played the chess pieces on the paper, and the computer made the corresponding moves and displayed them).
5. Conclusion
The program tests demonstrate that it can achieve the intended functions, not only correctly identifying the positions of the pieces but also possessing a certain level of skill. Adding more complex strategy algorithms and a robotic arm execution module will give the system even greater scientific and practical value.