Authors: Wang Na1, Wang Yi1, Yu Junxiong1, Wang Huifang2, Lei Baiying1, Wang Tianfu1, Ni Dong*
1 (Shenzhen University School of Medicine)
2 (Department of Ultrasound, Shenzhen Second People's Hospital)
Abstract: Accurate segmentation of the levator ani hiatus is of crucial clinical significance in the diagnosis of female pelvic floor dysfunction. Traditional methods rely on manual segmentation based on physician experience, which is labor-intensive and often unreliable. Therefore, to meet the needs of clinical pelvic floor ultrasound diagnosis, this paper proposes an intelligent method for levator ani hiatus identification. First, an automatic contextual fully convolutional neural network is used to fuse the levator ani hiatus image with the probability map obtained from the fully convolutional neural network, improving the local spatial consistency of the predicted segmentation results and refining the segmentation details. Furthermore, the method is combined with an active contour model to further improve the segmentation results. Experimental results demonstrate that the proposed method provides more accurate segmentation results than existing methods.
Foreword
Medical image segmentation is a crucial issue determining whether medical images can provide reliable evidence in clinical diagnosis and treatment. In recent years, the use of deep learning algorithms to handle medical image segmentation has become an important application of artificial intelligence, and significant progress has been made in medical image segmentation technology. Currently, network structures such as Convolutional Neural Networks (CNN), Fully Convolutional Networks (FCN), and Recurrent Neural Networks (RNN) are commonly used to solve medical image segmentation problems.
Female pelvic floor dysfunction (FPFD) includes a series of syndromes such as pelvic organ prolapse, stress urinary incontinence, and fecal incontinence. There are many pathogenic factors of FPFD, the main cause of which is damage to the levator ani muscle caused by pregnancy and childbirth. Pelvic floor ultrasound is the main imaging examination method for pelvic floor diseases due to its advantages such as real-time imaging, low cost, and no radiation [1]. In clinical diagnosis, doctors usually use a trackball to manually trace the outline of the levator ani muscle; however, this is often affected by the doctor's subjective experience, and the measurement steps are cumbersome, time-consuming, and have large errors. However, when realizing automatic segmentation of the levator hiatus (LH), the following challenges are faced: ① There is a lot of noise interference such as acoustic shadowing and speckle in the image; ② Different imaging conditions result in different intensity distributions in the image; ③ Unilateral or bilateral damage to the levator ani muscle affects its recognition, as shown in Figure 1. To solve these problems, this paper proposes an automatic LH segmentation method based on deep learning for the first time.
LH edge segmentation is the primary condition for its biological parameter measurement. Sindhwani[2] proposed a semi-automatic levator ani contour drawing tool based on level sets, which requires manual marking of the posterior inferior border of the pubic symphysis and the anterior border of the bottom of the puborectalis muscle. With the vigorous development of deep learning, it can achieve better performance in the field of medical ultrasound image processing compared with traditional methods. Convolutional neural networks[3] have good applications in foreground classification, but they complete image segmentation by image block classification, which will result in imprecise segmentation. Fully convolutional neural networks[4] can achieve pixel-level segmentation by fusing multi-visual scale information. However, the currently popular natural image networks often exceed the model complexity required in the field of medical images. Therefore, in response to the requirements of LH ultrasound image segmentation, an Auto-Context Fully Convolutional Network (AC-FCN) is proposed. By fusing LH ultrasound images with the probability map obtained by FCN, the local spatial consistency of the prediction map is improved and the segmentation details are enhanced [6]. In response to the problem of local missing parts in the segmentation results, an Active Shape Model (ASM) [7] is adopted to improve the LH segmentation effect through shape constraints.
In summary, the deep learning-based intelligent identification method for levator ani hiatus proposed in this paper has the following innovations: ① Compared with the traditional FCN network, AC-FCN has better performance in the segmentation task of LH, and can identify LH more accurately and quickly; ② AC-FCN integrates features of different sizes and levels, successfully solving the problem of coarse details in the segmentation results of traditional FCN; ③ Based on the segmentation results of the first two steps and the shape characteristics of LH, ASM is used for shape constraint, which further improves the segmentation effect of LH; ④ Combining the popular deep learning methods in the current research field with traditional methods, the deep learning network is used to extract rich feature information at a deep level in the image to obtain preliminary segmentation results. On this basis, traditional methods are organically combined to continuously optimize the experimental results, and a better research framework is constructed.
Figure 1. Pelvic floor ultrasound image (left) and its segmentation result (right). The red outline represents the manually drawn LH boundary, and the yellow and green arrows represent boundary defects caused by ultrasound features and levator ani muscle damage, respectively.
method
The deep learning-based LH intelligent recognition method proposed in this paper has the following research framework as shown in Figure 2, which mainly includes the following three points: First, the preprocessed LH ultrasound image and corresponding label are input into the Level 0 classifier (Level 0—AC-FCN) to extract multi-scale visual features through transfer learning and obtain the levator ani hiatus prediction map; then, AC-FCN is embedded into the automatic context model, and the probability map obtained by Level 0 is fused with the LH ultrasound image through multiple channels and input into the Level 1 classifier (Level 1—AC-FCN) to obtain a new prediction map, which is then fused with the LH image and input into the Level 2 classifier, and so on, iterating until a better segmentation result is obtained; finally, ASM is used to introduce constraints such as curve shape, position in the image, and continuity at the boundary to optimize the shape of the prediction probability map obtained by the last automatic context model and output the final segmentation result.
Figure 2. Framework diagram proposed in this paper
2.1 Adjusted Fully Convolutional Network
Long et al. [4] proposed a fully convolutional neural network (FCN) to perform pixel-level segmentation of input images of arbitrary size in an end-to-end, point-to-point manner. The most effective FCN model is FCN-8s. Based on FCN-8s, this paper proposes AC-FCN, a classifier for target and background regions of LH ultrasound images. The main adjustments are as follows: ① The last two convolutional layers in FCN8s are removed to reduce model complexity, avoid overfitting, and shorten training time; ② A fusion layer is added to fuse the feature maps obtained from the fifth pooling layer and the fourth pooling layer to enhance feature learning. The padding parameter of the first convolutional layer is set to 1. First, the formula for calculating the input and output feature map size of the neural network convolutional layer is as follows:
(1)
Where F <sub>i </sub> is the spatial size of the input feature map, F <sub>0</sub> is the size of the output feature map, K is the kernel function size, S is the stride, and P is the padding parameter, usually zeros are used to pad the edges of the image. In AC-FCN, after removing the last two convolutional layers of FCN-8s, when the kernel function K=3 and the stride S=1 in the convolutional layer, P=1, and F<sub> 0</sub> is always equal to F <sub>i </sub>, there is no problem of edge loss, so there is no need to use a clipping layer to clip the feature map when fusing information from the preceding and following layers.
2.2 Automatic Context Model Refines Segmentation Results
Although AC-FCN can output the predicted probability map of the target region more efficiently, it still has two major problems: ① It is not sensitive to the details in the image and the results are not refined enough; ② When FCN classifies each pixel, it does not fully consider the relationship between pixels, ignores the spatial regularization step commonly used in pixel-based classification, and lacks spatial consistency. In order to solve the above problems, the automatic context model is used to optimize the AC-FCN results [8].
The core idea of the automatic context model is that the k-th level classifier uses both the appearance features of the grayscale image and the context features of the predicted probability map obtained by the (k-1)-th level classifier. The (k-1)-th level classifier contains valuable information such as the basic shape of the target of interest, foreground and background contour segmentation. By combining the context features and grayscale features, a more effective feature description than the (k-1)-th level classifier is obtained, thus refining the probability of the predicted map.
(2)
Where h <sub>k</sub> is the model mapping function of the k-th level classifier, x and y<sub> k-1 </sub> are the probability maps of the levator ani muscle image and the output of the (k-1)-th level classifier, respectively, and J(.) is the mapping function of the levator ani muscle image and the output of the (k-1)-th level classifier.
Parallel cascaded operations combining x and y k-1 . This paper combines a levator ani muscle image and a probability map obtained by a k-1 classifier into a three-channel image, which is used as input to a k-level classifier to achieve contour refinement and spatial consistency optimization of the AC-FCN prediction map.
2.3 Optimization of Activity Contour Model
Although the cascaded multi-scale AC-FCN in this paper still has a strong ability to recover the boundary missing situation, there is currently no theory to guarantee that all missing boundaries can be recovered in an absolutely similar form. Therefore, after the last layer of context, we apply an auxiliary ASM model [9] to generate the final segmentation result on the prediction probability map. Using cross-validation, 372 LH images are divided into 12 subsets, 11 subsets are used as training sets, and 1 subset is used as validation sets. Each image has 12 main feature points and 60 secondary feature points located between the main feature points. The sample data and feature points are input into ASM to statistically analyze the LH shape distribution information and construct the shape model. Because the fuzzy and large-span occlusion boundaries have been identified by the AC-FCN cascade, only a small number of gaps need to be filled and improved by ASM. Experimental results show that ASM can effectively constrain the shape of LH, improve the segmentation results, and provide strong support for the accurate measurement of LH parameters.
Experimental results
To more comprehensively and accurately evaluate the segmentation results, referring to [10-12], this paper uses two types of evaluation metrics: region similarity and shape similarity. Specifically, four metrics are used to evaluate the LH segmentation results: Dice, Jaccard, Conformity Coefficient (Cc), and Average Distance of Boundaries (Adb). The first two are region-based metrics, while the latter two are distance-based metrics. Let G be the target region marked by the doctor, and S be the segmentation result of the algorithm. The calculation formulas for the evaluation metrics are as follows:
Where s(.) denotes the area operator, dmin (P G ,S) represents the distance from point P G on G to the nearest point on S, and similarly, dmin(P G ,S) represents the distance from point P S on S to the nearest point on S.
The distance to the nearest point on the contour, σG represents the number of points on the contour.
Based on the aforementioned metrics, the segmentation performance of our research framework AC-FCN and other popular deep learning networks in the segmentation field was evaluated and compared on test data, as shown in Table 1. The evaluation results show that the AC-FCN model, as the core algorithm of our framework, outperforms other networks in all metrics. Level 0 AC-FCN already surpasses other models; after embedding a context model, the segmentation effect gradually improves. Furthermore, to avoid potential overfitting, Level 2 AC-FCN yields satisfactory results, and the segmentation results are further optimized by constraining the LH shape using ASM.
Figure 3 shows the results obtained through different segmentation methods, as well as the accurate prediction map obtained through Level2-AC-FCN. Because CNN and U-net have poor segmentation performance, only the predicted segmentation results and actual LH boundaries of SegNet, FCN-8s, Level2-AC-FCN, and Level2-AC-FCN-ASM are shown. As shown in Figure 3, the segmentation result of Level2-AC-FCN-ASM is closest to the actual LH boundary.
Table 1 Comparison of different segmentation methods
Figure 3 compares the qualitative segmentation results of different methods. First row: accurate prediction map obtained by Level2-AC-FCN. Second row: the boundary between the predicted segmentation results and the actual LH (red) for SegNet (yellow), FCN-8 (cyan), Level2-AC-FCN (blue), and Level2-AC-FCN-ASM (green).
Experimental conclusions
This paper proposes a research framework for intelligent LH recognition based on AC-FCN, and achieves good results. First, by adjusting FCN-8s, a good base model is obtained, which improves segmentation accuracy while reducing model complexity, training efficiency, and memory usage. The adjusted FCN is embedded into an automatic context model, which enhances boundary details by cascading LH ultrasound images and predicted probability maps, resulting in a significant improvement in classifier performance. The probability map obtained from the automatic context model is input into ASM to complete shape constraints, which effectively solves the problem of missing LH edges, and the segmentation framework is also applicable to other ultrasound image tasks.