Research on Underactuated Biomimetic Robotic Eel Control Based on Deep Reinforcement Learning

1 Introduction

Fish have evolved remarkable swimming abilities over hundreds of millions of years, leading to extensive research into their body structure and kinematics. In nature, approximately 85% of fish propel themselves using a body and/or caudal fin (BCF) system. This propulsion method offers advantages such as high speed and efficient energy transfer, making it the most common design pattern for robotic fish. The BCF propulsion system can be further subdivided into different family patterns; for example, eels (Anguilliformes) primarily use wave propulsion, a method involving almost the entire body.

In recent years, a growing body of research has shown that fish employ passive mechanisms during swimming. The majority of the force required for stable swimming is generated by muscles in the anterior and mid-sections of the fish, while muscles in the posterior section are only responsible for transmitting force to the tail. This provides insights for the design of robotic fish, suggesting that swimming performance can be effectively improved by appropriately altering the stiffness of the propulsion structure and the flexibility of the fish's body. For example, Fiazza et al. designed a robotic fish with a compliant tail, achieving compliant tail oscillation through vibrations at the body's inherent frequency. White et al. used biomimetic tuna to study the role of body flexibility, demonstrating that increasing body flexibility can improve the swimming speed of the robotic fish. Zhong et al. designed a robotic fish combining an actively driven body and a compliant tail, achieving efficient swimming. They also designed a general kinematic model for this type of robotic fish and extended the method to different swimming modes. These studies, to some extent, demonstrate that under certain conditions, passive mechanisms (i.e., compliant movement) can improve the swimming performance of robotic fish. Based on relevant research, this paper proposes a method for designing robotic eels by combining two active bodies with two passive compliant bodies, utilizing the principles of underactuated and compliant motion.

Deep Reinforcement Learning (DRL) combines the powerful perceptual understanding capabilities of deep learning with the decision-making abilities of reinforcement learning, enabling end-to-end learning. When DRL is applied to robot control, the robot is controlled by a neural network that continuously interacts with the environment to obtain data and train itself, directly solving robot control problems without the need for complex models and control designs. Currently, there are many studies applying DRL to robot control, such as controlling robotic arms to push objects and controlling the movement of bipedal robots. Li et al. have also achieved motion control of soft robots. Although problems such as learning latency and sample validity still need to be solved, DRL has significant advantages in consistency and generalization in robot control. The motion control of robotic fish is currently mostly based on methods that simulate the kinematic equations of fish, such as the traveling wave equation proposed by Lighthill. However, this approximate kinematic equation cannot accurately and effectively describe the motion of all fish, and whether it is the optimal motion mode remains to be verified. This paper proposes a method that does not require setting kinematic equations, but utilizes DRL to enable a robotic eel to autonomously learn the ability to swim forward in a simulation environment. The control function of the robotic eel is obtained through data fitting, thereby realizing the control of the robotic eel. The applicability of the method is verified through relevant experiments.

2. Design of the robotic eel

2.1 Design of a propulsion mechanism based on active and passive compliant bodies

Zhong et al. proposed a novel propulsion mechanism—the Active and Compliant Propulsion Mechanism (ACPM)—for constructing robotic fish. The active body utilizes a drawstring mechanism for active bending, while the passive body is a flexible tail made of elastic material. The ACPM has been shown to exhibit the wave-like motion characteristics of fish, enabling high-speed and efficient swimming.

A single motor can control multiple joints of the robotic fish using a pull-wire mechanism, as shown in Figure 1.

Figure 1 Design of the wire pulling mechanism

In a cable-operated mechanism, there is a corresponding relationship between the rotation angle of the servo motor and the bending angle of the fish's body . Assuming that the friction and rotation angle of each joint are the same, the mapping relationship between and can be obtained:

(1)

Where r is the radius of the turntable; N is the number of joints; and d is the distance between the steel wire ropes on both sides.

The design method for the compliant tail is based on a mathematical model, the detailed modeling method of which has been introduced in our team's previous work. In subsequent research on ACPM, Zhong et al. extended the ACPM modeling method to other swimming modes by analyzing fluid torque and body bending moment, as shown in Figure 2. Figure 2 shows the torque phase distribution of four typical swimming modes, with the red line La representing the optimal body length for balancing fluid torque and body torque. As shown in Figure 2, the anguilliformes family, to which eels belong, has two passive compliant structures, while the other modes have only one compliant structure. Based on this finding, this paper adopts a combination of two active pull-line mechanisms and two passively compliant body segments to design a robotic eel, enabling the robotic eel to swim more efficiently. As shown in Figure 3, when the first pull-line mechanism bends to one side and the second pull-line mechanism bends to the other side, the robotic eel will exhibit an "S" shaped movement, which conforms to the swimming mode of real eels.

Figure 2 Optimization of different movement patterns in ACPM

Figure 3. Mechanical eel model

2.2 Prototype of the robotic eel

The robotic eel developed in this study consists of two ACPM segments. Since diving and surfacing functions are not considered, pectoral fins are not included. Based on the morphology of real eels and referencing the parameter ratios used in ACPM optimization for anteaters, this paper designs the robotic eel according to practical needs, such as the motor's fixed position. The robotic eel is approximately 550mm in total length, with an elliptical cross-section, a maximum width of 35mm, and gradually decreasing width. The body can be divided into 6 segments, with a compliant body length of 100mm and a compliant tail length of 135mm. The active body uses a pull-wire mechanism, pulled by a 0.8mm diameter steel cable. Each pull-wire mechanism has 4 equally spaced movable joints, with a total length of 80mm for the four movable joints. The compliant part is made from a pre-designed silicone casting mold, which effectively meets the requirements of passive compliance. The head and middle body segments are rigid parts used to house the servo motor and turntable. Support arms are also designed to prevent the steel cable from detaching when pulled. The turntable, joints, and rigid body parts were all made using 3D printing technology. The experimental prototype of the robotic eel is shown in Figure 4.

Figure 4. Experimental prototype of the robotic eel

Both servos are Hitec HS-5086WP models, providing 3.6 kg·cm of torque. An Arduino UNO control board is used, driving the servos via PWM signals. The control board and power supply are external, connected to the servos via DuPont wires. Due to the large mass of the compliant section, silicone membranes are used to increase buoyancy on the two sections of the cable-stayed mechanism, and silicone rubber is used for waterproofing. Furthermore, to limit the robotic eel's movement to a single horizontal plane, lead weights were added to adjust the gravity and buoyancy experienced by the prototype.

3. Simulation Optimization Methods

3.1 Simulation Environment

In deep reinforcement learning, neural networks enable the subject to directly extract and learn feature knowledge from raw input data and output it. Therefore, the effectiveness of the collected raw data is extremely important. In robot control, data collection generally follows two approaches: the first is to collect data directly in reality, i.e., continuously testing the robotic eel in water and collecting data. However, this method is labor-intensive and difficult, and the flexible body designed in this study exhibits passive movement. During the robotic eel's swimming process, its position, speed, and other data variables become more complex, making it difficult to extract features manually from a database. The second approach is to acquire data in a simulation environment, utilizing the powerful perceptual understanding capabilities of deep learning methods to effectively extract feature data. In this study, the second approach offers better adaptability and training speed. This paper uses MuJoCo as a simulation platform to perform discrete segmentation modeling of the robotic eel and achieve effective control. Performing deep learning in a simulation environment improves the adjustability of the eel model parameters and also provides better visibility into the deep learning process, making the impact of parameter changes on the eel's swimming state more intuitive.

MuJoCo is a general-purpose physics engine capable of quickly and accurately simulating the interaction between joint-like structures and their environment. It has been widely used in research and experiments in model-based computing, data analysis, deep reinforcement learning, and related fields. In MuJoCo, a corresponding aquatic environment was added, restricting the movement of the robotic eel to a single horizontal plane to simulate the balance between buoyancy and gravity. Using a room temperature of 20°C as a standard, the medium density of the aquatic environment was set to 1000 kg/m³, and the viscosity to 0.001 Pa·s, to simulate the resistance and viscous forces experienced by the robotic eel during movement. To reduce useless information in the simulation environment, the simulation model was simplified to a certain extent, as shown in Figure 5. The overall length of the robotic eel was maintained at 550 mm. The two separate rigid body components housing the servos were integrated into a single rigid body head to achieve continuous body actuation. Except for the rigid body head, the actuation components are integrated into a single unit, consisting of active joints and flexible compliant bodies. There are no unnecessary rigid body stiffnesses; this integrated actuation method is closer to that of a real eel. Furthermore, this experiment simplified the joint model while retaining the coupling relationship between the active joints in the two ACPM segments, which is also the focus of the simulation control in this paper. To improve the simulation accuracy of the flexible compliant body, this experiment divided the two compliant body segments into discrete blocks, setting them as passive joints, driven only by the last active joint of the drawbar mechanism, which can better simulate the compliant characteristics of soft materials. To improve the accuracy of the model, the bending characteristics of the compliant body were tested in the experiment, and the characteristic parameters of the passive joints were adjusted to make the simulation model as close as possible to a real eel.

Figure 5 shows the robotic eel model from MuJoCo.

3.2 Deep Reinforcement Learning

The SoftActor-Critic (SAC) algorithm is an off-policy, actor-critic algorithm. Unlike other reinforcement learning algorithms, SAC introduces entropy into the optimization policy. While achieving higher cumulative rewards, it maximizes the policy's entropy; the higher the entropy, the greater the policy's randomness. Policy entropy allows for highly random actions. During training, introducing policy entropy enriches and expands the policy exploration process, accelerates subsequent learning, and prevents the policy from prematurely converging to meaningless local optima, resulting in high robustness. The action space of the SAC algorithm is continuous, not just discrete stage control data; therefore, it performs well in solving continuous control problems and is suitable for robot control.

Before training the robotic eel using deep reinforcement learning, it is necessary to set the reward function, action space, and state space for its motion. Different reward functions will lead to different learning results, and an inappropriate reward function may cause the final result to fail to converge. This study aims to enable the robotic eel to learn the motion ability to swim forward in a straight line in still water with low input torque, i.e., to achieve efficient straight swimming. The reward function R for training the robotic eel is defined as follows:

(2)

Where Vx is the velocity of the robotic eel in the forward direction; T is the torque input to the motion space; Py is the position offset of the robotic eel perpendicular to the forward direction; Posture_r is the swimming posture, such as the rotation angle of the head; Safe_r is the swimming safety, used to eliminate abnormal postures during the robotic eel's swimming process, such as circling forward; is the torque coefficient, representing the participation weight of torque in the reward function R; is the offset coefficient, representing the participation weight of position offset in the reward function R; and is the posture stability coefficient, representing the participation weight of posture in the reward function R.

The motion space must be set to match the actual number of control inputs. To simulate the control of the movable joints by the drawstring mechanism, the four joints of each drawstring mechanism should be input with the same value to form a whole for control. The input signal range set in this experiment is [-10, 10]. When the input is less than 0, the movable joint rotates to the left; when the input is greater than 0, the movable joint rotates to the right.

The state space includes all information about the robotic eel at any given moment in the simulation environment, including various mechanical characteristics. However, in actual control, the available control information is limited. This study focuses primarily on the robotic eel's motion posture; therefore, the position and velocity information of the robotic eel are chosen to constitute the state space.

The SAC algorithm requires setting up the neural network. The actor neural network and the critic neural network in this paper have the same network architecture, both of which use a multilayer perceptron neural network with two hidden layers and 256 neurons in each layer. The parameter settings are shown in Table 1.

Table 1. Parameters for Deep Reinforcement Learning

The robotic eel was trained for 200 cycles in a simulation environment, with each cycle consisting of 500 steps and a control cycle of 10 ms per step. To improve the adaptability of the training results to the real aquatic environment, a random influence value was set in the first step of each cycle to simulate the deviation interference of position and velocity in reality. The larger the random influence value, the stronger the robotic eel's ability to adapt to environmental changes, and the more difficult it is to achieve convergence during training; however, if the random influence value is too small, it is difficult to achieve the desired perturbation effect. After testing, the random influence value selected in this experiment was 0.01m, which is the initial state value of the first step plus a random influence value between -0.01 and 0.01m.

Different training results can be obtained by adjusting the values of the torque coefficient, offset coefficient, and attitude stability coefficient in the reward function. Through repeated experiments, a more suitable coefficient can be obtained. Experiments showed that a too-small torque coefficient has little impact on the control of the eel, while a too-large coefficient leads to a decrease in the amplitude of the eel's swaying. A too-small offset coefficient has little restriction on the eel's offset, while a too-large coefficient causes the eel model to glide due to inertia after a few sways. To ensure a relatively stable training effect for the neural network, this experiment ultimately fixed the values of the torque coefficient and offset coefficient to 10⁻⁵ and 0.01, respectively. Then, by adjusting the value of the attitude stability coefficient , neural networks with different performance levels were trained.

In this experiment, the attitude stability coefficient was set to 0.1, 0.3, 0.5, and 0.7, resulting in four different training scenarios, as shown in Figure 6. The convergence of the training process was approximated by judging the degree of change in the total reward value. First, the moving average of the total reward value was calculated to filter the total reward curve; then, the absolute value of the slope of the smoothed total reward curve was calculated; finally, the curve with a slope absolute value consistently less than 0.5 was used as a representation of approximate convergence of the training process. The approximate convergence of the training process with different training parameters is shown in Table 2. By comparing the training results of the four different scenarios, it can be seen that the larger the value of the attitude stability coefficient, the more difficult it is for the neural network to converge during training.

Table 2 Approximate convergence steps under different attitude stability coefficients

Figure 6. Training process of the robotic eel under different posture stability coefficients.

In the four scenarios described above, the neural network with stable control performance was tested in a simulation environment. The control cycle for each step was set to 10ms. Under the control of the neural network, the robotic eel ran for 2000 steps, and the results were observed and data collected. Due to the presence of random values, the initial posture of the robotic eel during swimming was different, and the swimming gait required a certain amount of time to stabilize. Therefore, this experiment selected 400 steps between steps 800 and 1200 as the evaluation interval.

Figure 7 shows a comparison of the outputs of the two cable-operated mechanisms for four different attitude stability coefficients within the evaluation interval. The output of the cable-operated mechanism corresponds to the input in the motion space, with a value range of [-10, 10]. Every four steps, the outputs of the two cable-operated mechanisms at the current moment are recorded, and the average value is calculated, for a total of 100 sets of data. Experimental results show that as the attitude stability coefficient increases, the output of the cable-operated mechanism gradually decreases, and the amplitude of the output tends to stabilize. Furthermore, simulation experiments reveal that when the output of the cable-operated mechanism is too low, the swimming speed of the robotic eel decreases significantly.

Figure 7. Output of the two-section wire-stayed mechanism under four different attitude stability coefficients.

Since the goal of this study is to achieve efficient straight swimming for the robotic eel, its lateral offset must be monitored during its swimming process. Therefore, within the evaluation interval, this experiment compared the offset values of the robotic eel under four attitude stability coefficients. The robotic eel was simulated 10 times under each attitude stability coefficient, resulting in 40 evaluation intervals. The average offset value under the same attitude stability coefficient was calculated, as shown in Figure 8. When the attitude stability coefficient is too low, the amplitude of a single correction is too large, making it difficult to reduce the offset; conversely, when the attitude stability coefficient is too high, the amplitude of a single correction is too small, also making it difficult to reduce the offset. In other words, both excessively low and excessively high attitude stability coefficients lead to excessive swimming offset, making it difficult to ensure straight swimming. Furthermore, when the attitude stability coefficient is too high, the offset values become more dispersed, resulting in lower robustness.

Figure 8. Swing offset under four different attitude stability coefficients

As shown in Figure 8, when the attitude stability coefficient is 0.3, the trained neural network performs well in terms of output and swimming stability. Therefore, the output of this neural network is selected to control the real robotic eel.

3.3 Control Function

In the neural network with an attitude stability coefficient of 0.3, this experiment selects a set of outputs that performs well for control fitting. For the two cable-stayed mechanisms within the evaluation interval, the rotation angle of each joint is output according to the time change. It is found that the change of the joint angle is approximately a trigonometric function. Then, by curve fitting through the algorithm, an approximate joint angle change curve can be obtained, as shown in Figure 9. As shown in Figure 9, the joint angle change frequency of the first cable-stayed mechanism and the second cable-stayed mechanism is the same, with a frequency of 2Hz; the maximum amplitude of the joint rotation of the first cable-stayed mechanism is 6°, and the maximum amplitude of the joint rotation of the second cable-stayed mechanism is 12°; and there is a phase difference between the two, which is 67°. From formula (1), the correspondence between the servo motor rotation angle and the joint rotation angle can be known, and then the control function of the servo motor can be obtained, where r=16mm, N=4, and d=22mm.

Figure 9. Joint rotation angle variation curve

4 Experimental Results

Multiple experiments were conducted on the experimental platform to verify the effectiveness of the obtained control function on the robotic eel prototype. Figure 10 shows the experimental platform designed in this paper, in which the pool is a cuboid of 180cm×140cm×60cm, and the camera is installed above the pool to film the robotic eel swimming in the water.

Figure 10 Experimental Platform

4.1 Experiment on the driving mechanism of the two-section pull wire

This experiment tested the feasibility of driving a robotic eel using a two-section pull-wire mechanism. Based on the control function, the frequencies of the two pull-wire mechanisms were set to 2Hz, with joint rotation amplitudes of 6° and 12° respectively, and a phase difference of 67°. The robotic eel was configured in the following ways: (1) only the first pull-wire mechanism was involved in the drive; (2) only the second pull-wire mechanism was involved in the drive; and (3) both pull-wire mechanisms were involved in the drive. As shown in Figure 11, the green line represents the starting line, allowing for comparison of the robotic eel's swimming performance. The robotic eel started at time 0s. When only one pull-wire mechanism was driven, the robotic eel swam forward slowly and exhibited lateral deviation; when both pull-wire mechanisms were driven in coordination, the robotic eel swam forward well. The test results show that driving with a two-section pull-wire mechanism can effectively drive the robotic eel forward.

4.2 Swimming speed experiment

This paper designs a speed test experiment. Using the straight-line swimming speed as the standard, the frequency of the two pull-line mechanisms is set to 2Hz, and the joint rotation amplitude is kept at 6° and 12° respectively. The swimming performance of the robotic eel is measured under different phase differences to verify the effectiveness of the obtained control function. Since the phase difference of the obtained control function is 67°, this experiment sets the phase differences to 40°, 50°, 60°, 70°, 80°, and 90° respectively, keeping other parameters of the control function unchanged, and conducts comparative experiments. To eliminate the influence of swimming instability on speed, a speed test zone is set up in this experiment. The speed is calculated after the robotic eel stably swims into the speed test zone. Ten swimming tests are performed under each phase difference, and the average speed is calculated as control data. Table 3 shows the average swimming speed of the robotic eel under different phase differences. From Table 3 and Figure 11, we can see that the average speed of the robotic eel is the highest when the phase difference is 67°, indicating that the control function is effective in straight-line swimming.

Table 3. Swimming speed of the robotic eel under different phase differences.

Figure 11. Swimming performance of the robotic eel driven by the pull mechanism.

5. Discussion and Analysis

To extend the swing amplitude limit of the robotic eel, this experiment set the upper limit of the rotation angle on one side of each movable joint to 35° in the simulation environment, meaning the maximum swing amplitude of each wire-pulling mechanism was 140°. However, during subsequent training of the robotic eel, it was found that the movable joint of the first wire-pulling mechanism did not reach the maximum rotation angle, and the overall swing amplitude did not even exceed 60°. The swing frequency of the robotic eel ranged from 1.2 to 3.3 Hz, which is consistent with the low frequency and low amplitude of head swings observed in real-world eels, indicating the applicability of this deep reinforcement learning method.

For underwater robots made of eels, common design methods at home and abroad can be divided into the following categories: (1) Using a multi-joint rigid body design, a drive module is placed in each joint, and the robot is driven by motors to achieve repeated swinging. The overall structure and control are generally quite complex. There are also robots driven by new materials, using shape memory alloys and piezoelectric sensors to achieve deformation of the fish body and thus propulsion. These types of robotic eels have drawbacks in terms of energy conversion. (2) Soft robotic eels, which change the local properties of soft materials through magnetic induction, air pumps, etc., to produce regular changes and thus generate swimming. Unlike the above studies, the robotic eel in this paper utilizes the passive mechanism of fish swimming and is designed by connecting rigid joints with a flexible body, which includes the characteristics of both and has greater advantages in terms of control.

This paper studies the straight-swimming control of eels using a method not based on given motion curves, obtaining control results approximating a sine function. This verifies the rationality of the sine function in the straight-swimming control of biomimetic fish to a certain extent and also proves the feasibility of the proposed method. This method can be used to learn optimized control functions for various underwater biomimetic robots. Furthermore, the proposed method is more suitable for controlling complex swimming motions, such as escape and turning, which are difficult to describe directly using kinematic equations. Future research will further explore the learning and optimization of complex motions.

6 Conclusions

This paper designs a novel underactuated robotic eel by utilizing the passive mechanism of fish swimming and establishes a simulation model. The DRL method is also used to enable the robotic eel to autonomously learn swimming motion. Under different posture stability coefficients, the control performance of the neural network is analyzed, and control tests are conducted on a prototype using a fitted function, successfully achieving straight-line swimming. Simulation and experimental results demonstrate the effectiveness of the proposed method, providing insights for the control of underwater robots with similar structures. Future research will focus on optimizing the robotic eel model, improving the control fitting process, and further exploring the influence of different body parameters and other motion parameters on swimming performance.

Research on Underactuated Biomimetic Robotic Eel Control Based on Deep Reinforcement Learning

Read next

CATDOLL CATDOLL 115CM Nanako (TPE Body with Hard Silicone Head) Customer Photos

CATDOLL Kelsie Soft Silicone Head

CATDOLL 132CM Luisa Silicone Doll

CATDOLL Katya Soft Silicone Head