With social progress and economic development, people are paying increasing attention to the safety of their communities. Manual security checks not only increase the workload of security personnel but also make entering and exiting the community inconvenient. Our project aims to build an intelligent security system using biometric identification methods, specifically facial recognition and finger vein recognition, to assist the community in personnel registration and entry/exit screening. This system is convenient, fast, and highly secure. This paper will introduce the intelligent security system and provide a technical overview of facial recognition and finger vein recognition technologies. It will then elaborate on the relevant work done by our research group in facial recognition and finger vein recognition technologies. Finally, it will discuss the development trends and application prospects of intelligent security and biometric identification.
1. Project Background
Communities are large collectives where urban residents live together. With social progress and economic development, communities must not only meet people's basic housing needs but also fulfill their expectations for safety and convenience. Traditional communities typically use security booths to control access, requiring significant time and effort to collect resident information. However, due to urbanization and the increase in the floating population, manual methods have become increasingly inefficient. Security guards cannot accurately determine whether people entering or leaving belong to the community, and since many residents are only renting short-term, information collection needs to be repeated repeatedly. Therefore, there is an urgent need for automated methods to complete resident information entry and screening of people entering and leaving the community.
Therefore, we chose automated biometric identification technology as a means of personnel identification. Among the many biometric features, those commonly used as identification features include facial features, hand features, eye features, gait, etc. Among them, hand features include hand shape[20], fingerprint[21], finger vein[22], palm vein[23], palm print[24], and finger back print[25]. Although the global biometric identification market is in a stage of rapid and significant development, in reality, each biometric feature, such as fingerprints and irises, has its own advantages and disadvantages. Facial feature detection and recognition technology is faster to collect data, but the interference and constraints of the collection environment, and the fact that iris recognition biometric features require imaging of eye features, make them less acceptable to users, etc. Common biometric features and their advantages and disadvantages are shown in Table 1.
Table 1 Comparison of common biological characteristics and their advantages and disadvantages
Facial features, being the most distinctive human characteristic, are widely used, easy to collect and manually compare, making them ideal for community access control security. In addition to access control, numerous cameras within the community can also perform facial recognition through video recording.
Fingerprints are currently the most widely studied and applied feature. According to data, fingerprint-based identification technology accounts for 84% of the entire biometric market[26]. Fingerprint recognition has the advantages of simple entry and high user acceptance, but fingerprint recognition also faces many difficulties in many situations. For example, the reliability of fingerprint features is difficult to guarantee when the surface of many people's fingers is worn. In addition, many women's fingers are relatively slender, and the surface texture features are relatively indistinct. These factors bring considerable difficulties to the collection work and the subsequent feature extraction and matching identification work. Moreover, with the continuous development and progress of society, fingerprint films and other products that can be deceiving have appeared on the market in recent years, which is a great challenge to the anti-counterfeiting performance of fingerprint-based biometric identification technology. In addition to the defects caused by the surface texture similar to fingerprints, palm print-based biometric identification technology also has the problem of low anti-counterfeiting performance due to the high similarity of the main texture lines. Therefore, in many situations where the security requirements are relatively high, fingerprint or palm print-based biometric identification technology is rarely used.
Finger vein patterns effectively overcome the shortcomings of surface textures, such as poor robustness and ease of forgery. Furthermore, due to their unique imaging principle, they can achieve automatic liveness detection, preventing forgery methods such as fingerprint films, facial photographs, and even 3D-printed facial models, thus further enhancing their anti-counterfeiting performance. Therefore, for door lock systems, finger vein patterns are more suitable than facial or fingerprint patterns. Forging a fake finger vein model would require replicating the various meridians and blood vessels of a human finger, which is far more complex than forging a 3D facial model.
Leveraging the advantages of facial and finger vein recognition, we designed and implemented a community intelligent security system based on biometric identification. This system comprises two parts: a community access control system based on facial recognition and a door lock system based on finger vein recognition.
Firstly, at the community access control point, a facial recognition-based access control system is installed to perform functions such as face registration, face detection, and face recognition, greatly reducing manual labor and improving the community's security and convenience. We monitor personnel by installing multiple cameras at the community entrance, and a large screen displays the faces and identity attributes of people entering and exiting in real time. If a non-resident is detected, an alarm is triggered, notifying security guards to investigate the suspicious person. The community access control system uses front-end devices to collect image data, which is then processed by a local front-end server. The detected feature codes are matched to a number and sent to a remote cloud server for identification. The system queries a database to retrieve personnel file information based on the identified features and sends feedback to the front-end server for display on the screen. Unrecognizable faces are displayed in a specific area of the screen with a prompt for investigation, and simultaneously sent to the nearest security personnel's handheld terminal for further investigation. For visitors, the system contacts the resident, who confirms and registers their identity via an app, as shown in Figure 1.
Figure 1. Community access control system based on facial recognition
The purpose of a community access control system is primarily to address the issue of unauthorized personnel tailgating into the community. Detected visitors are registered and routinely checked. The system first collects resident identity information and facial features to form a recognition database. Then, it uses cameras deployed at key locations such as the community gate, major road intersections, floors, and lobbies for exclusive checks. Individuals not identified are sent to the nearest security personnel's mobile terminal, informing them of their location and facial features, prompting further investigation. This system reduces the workload of security personnel, improves community security identification levels, and avoids vulnerabilities caused by reliance on image recognition for manual checks. The system can be used in various exclusive check scenarios such as schools, communities, factories, government agencies, office buildings, hospitals, and scenic spots, and can also complete attendance registration.
Then, a finger vein recognition-based door lock system was installed on the security doors of residents' homes in the community. Finger vein recognition is more secure than digital passwords or fingerprint passwords, and can effectively prevent problems such as lost keys, password leaks, and counterfeit fingerprint films by criminals.
Community access control systems based on facial recognition and door lock systems based on finger vein recognition can effectively prevent criminals from entering communities and homes, and are more efficient and convenient than manual screening. This paper will first provide a technical overview of facial recognition and finger vein recognition technologies, and then elaborate on the relevant work done by our research group in these two technologies.
2. Facial recognition
2.1 Overview of Face Recognition
Face recognition has always been an indispensable part of the field of biometric recognition and has been widely studied and applied. As early as the 1990s, research on face recognition began to become popular[1]. To this day, the research results of face recognition are still published in a large number of famous journals and conferences and are gradually being applied to people's daily lives. Even the popular iPhone has face recognition function. In the decades of development of face recognition technology, a large number of algorithms have emerged. We can simply divide them into four categories according to the nature of the algorithms[2]. The first category is the overall learning algorithm that first became popular. That is, it derives low-dimensional representation through certain distribution assumptions, such as calculating the linear subspace of the image[3] and some methods of sparse representation[4]. Because these methods have strong prior assumptions, and these assumptions are not necessarily applicable in different situations, the recognition accuracy of these methods is the lowest. The second category is based on algorithms that introduce local features, including the classic LBP (Local Binary Pattern)[5] algorithm, which encodes local information in binary and then uses statistical histograms as feature patterns to classify and recognize faces. However, the drawback of these algorithms is that these features lack uniqueness and compactness, so the recognition rate of such methods is only about 70%. The third algorithm is a machine learning method based on local descriptors. It trains local filters through learning and then outputs the recognition results through a large number of local filters. This method was a hot research topic before the popularity of deep learning. Until the fourth algorithm, namely the algorithm based on deep learning, became popular. Deep learning generally refers to deep neural networks. In fact, neural networks were proposed as early as the 20th century. However, due to the limited computing power at that time, they did not develop well and were not favored by researchers. Until the 2012 ImageNet competition, when Hinton's team achieved remarkable results using deep neural networks[6], deep neural networks were gradually valued and used in various fields of computer vision. In 2014, DeepFace[7] and DeepID[8] achieved the best recognition rate at that time in the LFW dataset[9] and surpassed the performance of humans in unconstrained situations for the first time. This marked that in face recognition (in unconstrained situations), machines have surpassed humans. This exciting research result has led researchers to shift their focus in facial recognition research to deep learning-based methods.
The deep learning-based face recognition algorithm applied in this project mainly includes two key technologies: face detection and face recognition. Face detection involves selecting faces in an image or video, requiring the selected location to be as close as possible to the actual location and minimizing the probability of non-face areas being selected. Face recognition refers to comparing the features of the test face with pre-stored faces to determine the identity of the test face. Face recognition is typically judged by its recognition rate. Given several pairs of face images (half from the same person, half from different people), the algorithm predicts whether these image pairs belong to the same person; the proportion of correctly identified image pairs out of the total number of image pairs is the recognition rate. Currently, most mainstream face recognition algorithms use deep convolutional networks to extract facial features and determine whether image pairs belong to the same person based on the distance between features.
2.2 Face Recognition Algorithm Based on MarginLoss
In face recognition technology, the quality of feature extraction directly determines the accuracy of recognition and classification. The key to feature extraction is to constrain the faces of the same person in the feature space to cluster together, while the faces of different people are far apart, so as to prevent misjudgment when using distance to determine whether a face pair comes from the same person. Our research group proposes a new loss function for face recognition, namely MarginLoss[13]. Based on the existing network structure, the trained network can increase the distance between face classes, thereby improving the accuracy of recognition.
2.2.1 Algorithm Principle
First, let's introduce the two loss functions, SoftmaxLoss and CenterLoss, in detail.
SoftmaxLoss has extremely wide applications in convolutional neural networks. Suppose we have a k-class classification problem with a training set of $\frac{\partial \mathbf{x}}{\partial \mathbf{x}$, where $\frac{\partial \mathbf{x} ...$, then SoftmaxLoss can be defined as follows:
(1)
Here, represents the parameters of the model. S(.) denotes the index function. If X is true, then s(x) = 1; otherwise, s(x) = 0.
CenterLoss, proposed by Wen et al. [19], is a discriminative feature learning method that minimizes the intra-flower spacing. It is defined as follows:
(2)
Where Ci is the sample center feature of class yi .
Based on SoftmaxLoss and CenterLoss, we propose a new loss function, MarginLoss. We aim to design a loss function that can increase the distance between samples within classes and decrease the distance between samples within classes, as shown in Figure 2.
Figure 2 Loss Improvement Motivation
Our loss function design mainly considers the following factors:
(1) In the face recognition problem, each sample should be as close as possible to its center (less intra-class difference) and as far away from the center of other classes (greater inter-class distance).
(2) During the training phase, samples with sufficiently large inter-class distances or sufficiently small intra-class distances should be excluded. Otherwise, the training process will be unstable and converge slowly. Therefore, selecting training samples is crucial. Thus, combining the above two points, MarginLoss is defined as follows:
When the label of x <sub>i </sub> is j, I <sub> ij</sub> = 1; otherwise, I<sub>ij</sub> = -1. This is the defined margin. If I <sub>ij</sub> = -1, then MarginLoss only includes samples x<sub> i </sub> that satisfy the condition. If I <sub>ij</sub> = 1, then MarginLoss only includes samples x<sub> i </sub> that satisfy the condition. In this way, MarginLoss is applied to samples that are more difficult to train on.
In theory, class centers should be updated as deep features change during training. However, updating class centers during training on the entire training set is impractical and ineffective [19]. Therefore, when using MarginLoss, we choose to use mini-batch for class center updates. In each iteration, the class centers are updated based on the samples in the mini-batch. The parameter changes during the update are as follows:
(5)
Finally, we employ three loss functions for joint supervision: SoftmaxLoss, CenterLoss, and MarginLoss. Our loss function is expressed as follows:
(6)
Here, λ<sub> i </sub> represents the weight of each loss function. To evaluate the effectiveness of our method, we will combine the three loss functions to obtain six combinations, and compare the six different combinations of loss functions (see Table 2) to verify the effect of our MarginLoss.
Table 2 Specific parameter settings for loss function combination
2.2.2 Algorithm Experiment Comparison and Result Analysis
We used the LFW[9] and YTF[10] databases to compare the face recognition rates of our algorithm with other mainstream algorithms. We used the Webface[11] and VGGFace[12] databases, and used the same network and the same data, but modified the loss function to compare the face recognition rates. The results are shown in Table 3.
Table 3 Comparison Experiment of Algorithm with Mainstream Algorithms
Where (c) indicates that the algorithm uses cosine distance for calculation, and (e) indicates that Euclidean distance is used for calculation.
Table 4 Comparison of different loss function algorithms with the same network.
(a) indicates that the image has been aligned.
Experimental results show that our algorithm is competitive with current mainstream algorithms using less training data (0.46 million) and a single network. When comparing different losses using the same network and data, Table 3 shows that our algorithm outperforms simply using Softmax Loss (e.g., 99.09% vs. 96.62% in LFW) and improves Center Loss to some extent (e.g., from 98.23% to 99.09% in LFW). Table 4 shows that S+C+M shows some improvement in recognition rate compared to S+C (e.g., from 68.8% to 72.3% in VGGFace(a)).
3. Finger vein recognition
3.1 Overview of Digital Vein Recognition
Compared with other biometric features, hand-based recognition has the following advantages: First, due to the special structure of the hand, this part has relatively rich surface textures, wrinkles, and intricate blood vessel composition. The relatively rich features can effectively reduce the difficulty of feature extraction and recognition. In addition, hand biometric features also have advantages such as easy acquisition, relatively high user acceptance, relatively low acquisition equipment, and small image size, which facilitates computer storage and calculation. Therefore, among many biometric features, hand features have been studied and applied relatively widely. As for finger vein biometric features, due to their advantages in terms of ease of acquisition, anti-counterfeiting, and relatively good performance, they have long been the focus of experts and scholars. As early as 2004, Miura N, Nagasaka A, and Miyatake T et al. of Hitachi, Japan, first proposed a repetitive linear tracking algorithm for vein images based on the difference in grayscale between the vein and other parts of the finger under near-infrared light [27]. As shown in Figure 3(a), this method mainly relies on the difference in gray levels between the vein position and its two sides. Starting from the initial random point position, it repeatedly tracks the position of the vein. The algorithm has a good recognition rate and equal error rate, and the robustness of the method has been confirmed. In 2007, Miura N et al. further studied and introduced the concept of curvature in mathematics into the vein recognition algorithm [28]. As shown in Figure 3(b), they used the feature of the local maximum curvature of the vein cross-section position to propose the idea of extracting the position feature of the vein centerline using the local maximum curvature. The results showed that it achieved a better recognition rate and equal error rate than repeated linear tracking. In 2009, Yang Jinfeng et al. of the State Key Laboratory of Signal Processing in Tianjin further broadened the idea of finger vein recognition. They introduced Gabor filters into the field of finger vein recognition and adjusted the filter parameters according to different finger vein positions, directions and widths to optimize the filtering results [29]. In 2012, Peng Jianjiang et al. from Harbin Institute of Technology took the lead in introducing SIFT features into the study of vein recognition[30], and used the good translation and rotation invariance of SIFT descriptors to better extract finger vein features. Our research group, based on the self-collected finger vein and finger back print database THU-FV[34], proposed a competitive coding method based on cross-point enhancement.
Figure 3 shows several typical hand feature acquisition devices and the acquired images.
3.2 Finger vein recognition algorithm design
3.2.1 Preprocessing
The preprocessing module based on finger veins and dorsal fingerprints mainly includes the extraction of the Region of Interest (ROI), which involves extracting the finger position, and normalizing the illumination and image size. However, for different data collectors, the shape of their fingers and the variations in brightness in the region caused by the thickness of their fingers vary greatly. Effectively overcoming these difficulties is the foundation and prerequisite for feature extraction and matching recognition.
(1) Extraction of the finger region of interest
Due to the illumination from the near-infrared LED light source, the brightness at the finger position is higher than that of the background area. Therefore, the position of the finger edge to be extracted can be based on the extended Sobel edge detector, as shown in Figure 4.
Figure 4. ROI extraction based on extended Sobel detector.
In the actual ROI extraction process, we move the extended Sobel edge detector, as shown in Figure 4, from the center line of the image to both sides. When the convolution value of the pixel block region and the extended Sobel template is greater than the preset threshold, we consider that the edge region of the finger has been found.
(2) Normalization of size and illumination
To facilitate subsequent processing, we first perform bicubic interpolation on the extracted finger regions of interest to normalize the size of all images to 100*200. Next, since there are still significant brightness differences between different images, it is necessary to perform illumination intensity normalization on the images. The specific process is shown in equation (7).
(7)
Where I (i,j) and σ represent the gray values at the corresponding positions before and after normalization, respectively; m and σ represent the mean and variance of the original image gray values before illumination normalization, respectively; and m and σ represent the mean and variance of the image gray values after illumination normalization.
3.2.2 Feature Extraction
In image processing and pattern recognition, extracting appropriate descriptive attributes, or image features, from processed images is a crucial step. The most important aspect of feature extraction is ensuring that the extracted feature regions contain separable information while removing redundant and noisy information. The quality of feature extraction often determines the performance of subsequent classifier classification, thus requiring special attention. During feature extraction, we aim to minimize the impact of interference and maintain high similarity for the same finger after different acquisitions (despite noise and redundancy). For different samples, i.e., different fingers, feature extraction should maximize the display of their differences. That is, we aim to minimize intra-class differences and maximize inter-class differences. To achieve this, during feature extraction, we should extract as many unique features as possible that distinguish a sample from others, while removing as many common features as possible. This maximizes the accuracy and efficiency of subsequent classification, thereby improving system performance to the greatest extent possible.
Specifically, in the field of biometric recognition based on finger veins and finger phallic prints, the design of feature extraction algorithms can be broadly divided into two categories: spatial domain-based (image domain-based) and transform domain-based. Each category has its advantages and disadvantages. Spatial domain-based methods are intuitive and straightforward, as operations on images are based on pixel blocks, making them simple, intuitive, and easy to understand. However, they are susceptible to local extrema and relatively sensitive to noise. Transform domain-based methods are less sensitive to local extrema and noise, but because they involve transforming the original image, they are not as intuitive to understand.
Many experts and scholars have conducted in-depth and detailed research on biometric extraction algorithms for finger veins and finger back prints. Local Binary Pattern [31], Local Maximum Curvature [32] and Gabor Competitive Coding [33] are all classic and commonly used algorithms for finger vein feature extraction. Among these algorithms, Local Binary Pattern and Local Maximum Curvature are operations directly in the image domain, while Gabor Competitive Coding is an operation in the transform domain after performing Gabor transform on the original image using a Gabor filter.
3.2.3 Competitive Coding Method Based on Cross-Point Enhancement
Since the competitive coding method of Gabor filter only focuses on the direction with the largest amplitude, this method is very effective for images containing only finger veins. Gabor filter has a relatively large response to textures with low gray values similar to finger veins, but a relatively small response to relatively bright finger back texture information. Therefore, for databases based on novel finger vein and finger back texture features [34], this simple method based on the direction with the largest amplitude will ignore the direction information of finger back texture. Therefore, based on this consideration, in order to make more effective use of the direction information of finger back texture in novel multimodal images, we propose a competitive coding method based on intersection enhancement. In the Gabor competitive coding method, Gabor filters in 6 directions are usually selected and convolved with the original image, and the direction with the largest amplitude is selected as the final coding direction. In order to make more effective use of the texture direction information of finger back texture with low amplitude response, we designed the following method to identify the texture intersection of finger vein and finger back texture and encode its "direction" as 7. The specific calculation formula for identification is as follows:
(8)
Here, represents the maximum value of the response after processing the original image using Gabor filters in six directions, and represents the minimum value of the response. α is a manually set parameter, and T represents the final determined threshold. When the difference between the maximum and minimum values of the filtered result at a pixel is greater than this threshold, the pixel is considered to be the intersection of finger vein and back fingerprint textures. This allows for more effective utilization of the texture information of the back fingerprint.
Figure 5 Feature extraction based on IGDC
As shown in Figure 5, the brightest point with the highest gray value represents the intersection of finger veins and finger back prints identified according to Equation (8). Such points contain information from both finger veins and finger back prints, making them more representative. Obviously, paying more attention to such points during identification will help reduce the differences between samples within a class and increase the differences between samples between classes. Therefore, a matching identification method is proposed for such points to further improve the system's performance.
3.2.4 Matching and Recognition
After the image acquisition, preprocessing and feature extraction steps described above, the input image to be identified will be compared with the images in the database to perform the final matching and recognition steps. Let the two feature images to be matched be R and T, and their size be m*n. Then the matching score between them S(R,T) is shown in equations (9) and (10).
In equation (9), R(x,y) and T(x,y) represent the values of feature maps R and T at (x,y), respectively. w and h represent the magnitude of the translation of the two feature maps in the x and y directions during actual matching, while (x,y) represents the flag of the matching result. For the Local Binary Pattern (LBP), Local Maximum Curvature (LMC), and Gabor competitive coding methods introduced in the previous section, the calculation method of △(x,y) is shown in equation (10). For our proposed competitive coding method based on intersection enhancement, in order to give more attention to the intersection position of the identified finger veins and finger back prints, we modify the calculation method of the matching score as shown in equations (11) and (12), respectively.
In the calculation method shown in Equation (12), we assign a larger weight to the successfully matched intersection points, and in order to normalize the final calculated score, the denominator of Equation (11) is doubled in Equation (9). After calculating the final calculated score, we use the nearest neighbor classifier to classify the input test image, that is, we select the class to which the sample with the largest calculated score in the image database belongs as the class of the final discriminant output.
3.2.5 Experimental Comparison and Result Analysis
To verify and analyze the rationality and effectiveness of the algorithms introduced and proposed above, we conducted experiments on the THU-FV[34] database. We compared the experimental results of LBP[31] based on local binary mode, LMC[32] based on local maximum curvature, GCC[33] based on Gabor competitive coding, and RLT[27] based on repeated line tracking with the competitive coding based on cross-point enhancement (IGDC) that we specifically proposed based on novel multimodal features of finger veins and finger back prints, as shown in Table 5.
Table 5 Comparison of Experimental Results
As can be seen, our results outperform mainstream algorithms on the dataset.
4. Summary and Outlook
This paper introduces an intelligent security system based on biometric recognition. The system mainly includes an access control system based on facial recognition and a door lock system based on finger vein recognition, which improves community security while being convenient to use. The paper then focuses on facial recognition and finger vein recognition technologies, summarizing their basic concepts and development, and presenting the research group's improved algorithms for these technologies.
In the future, biometric recognition technology will be widely used in many fields such as finance, public security, border inspection, government, aerospace, power, factories, education, healthcare, and numerous enterprises and institutions. How to implement this technology will be the future direction of the industry, and improving recognition rates and speeds are also key areas we should focus on.