A Review of Deep Learning Methods for Text Sentiment Analysis (Part 1)

Abstract: Text sentiment analysis aims to mine and analyze the opinions and emotions contained in text, thereby improving performance in applications such as personalized services, recommendation systems, public opinion monitoring, and product research. From a machine learning perspective, text sentiment analysis can generally be transformed into a classification problem. The key to its processing lies in text representation, feature extraction, and classifier model building, with the most crucial aspect of traditional methods being the construction of a sentiment feature dictionary.

In recent years, deep learning methods have made remarkable progress in many fields such as image and speech recognition. Compared with traditional machine learning methods, the biggest advantage of deep learning is that it can automatically learn rich and effective features from a large number of data samples, thereby achieving better results. Existing research shows that at the text representation level, word vector representation methods can acquire semantic, syntactic, and structural information of text, providing a solid foundation for sentiment analysis research and becoming a current research hotspot in this field. This paper first introduces the concept and problem classification of text sentiment analysis, reviews the relevant work of deep learning in text sentiment analysis, discusses in detail the text representation methods and deep learning models in text sentiment analysis, introduces the current problems of deep learning in text sentiment analysis applications, and looks forward to future research directions and trends in this field.

1 Introduction

In the recently concluded historic Go match between humans and machines, Google's AlphaGo defeated Go master Lee Sedol 4-1. While people marveled at the progress of artificial intelligence, they paid even more attention to the core algorithm behind AlphaGo—deep learning. Indeed, in recent years, deep learning has achieved revolutionary performance improvements in tasks such as image processing, speech recognition, and machine translation, sparking a nationwide craze for deep learning. AlphaGo's success brought this craze to its peak. One must ask: does deep learning also play a unique role in the world of sentiment analysis, a field most unique to humans? This article does not attempt to answer this question, but rather reviews and looks ahead to its application in text sentiment analysis from the perspective of technological advancements.

Sentiment analysis, also known as opinion mining, is usually defined as the computational learning of opinions, emotions, and feelings such as joy, anger, sorrow, happiness, criticism, and praise expressed in texts. It is a sub-problem of the field of Affection Computing [67][91]. With the application of mobile Internet technology in various fields of life, text information is becoming increasingly rich and diverse, and contains huge commercial, political, and academic value. Sentiment analysis has gradually become a research hotspot in academia and industry.

Looking back at the history of text sentiment analysis research, studies abroad (especially in English-speaking countries) started earlier and many methods have been proposed. However, domestic research on this issue is still insufficient, particularly regarding sentiment analysis methods for Chinese texts. Due to some unique characteristics of Chinese, many methods for English texts cannot be directly applied. Therefore, summarizing and reviewing the current progress in text sentiment analysis and processing methods is more meaningful and necessary for the analysis of Chinese text sentiment.

Modern machine learning technology can be said to be the core tool of text sentiment analysis. In recent years, deep learning methods have performed outstandingly in many tasks and are gradually evolving into the most powerful and popular artificial intelligence tool. Its application in text sentiment analysis has become the most cutting-edge and active field. In fact, the concept of multi-layer neural networks related to deep learning was proposed earlier, but due to various reasons, deep network models did not achieve good results and were not recognized by people. For the development history of deep learning, please refer to Hu Xiaolin's summary in "Artificial Intelligence Communications" [104]. The ideas and methods related to deep neural networks were proposed again in 2006 by Hinton et al. [33] under the concept of "deep learning". Since then, deep learning methods have achieved remarkable results in speech recognition, image recognition, speech synthesis, text translation and other fields. Many researchers have devoted themselves to the research of deep learning methods in model, training and application scenarios. For the problem of text sentiment analysis, many researchers have also tried to use deep learning methods to process it and have achieved significant improvement in results. Summarizing the deep learning methods of text sentiment analysis can help us clarify the current research dynamics in this field.

Therefore, this paper mainly reviews the relevant concepts and general methods of text sentiment analysis, and provides a detailed introduction and summary of text sentiment analysis methods using deep learning. For ease of presentation, Table 1 provides a table of English abbreviations used in the article. The subsequent content of the article is arranged as follows: Section 1 introduces the current research status of text sentiment analysis, summarizes the relevant concepts and main methods of text sentiment analysis; Section 2 introduces commonly used text representation methods in related fields of text processing, with a focus on the continuous word vector representation method currently used by many researchers; Section 3 introduces the deep neural network models that are widely used by researchers, and focuses on how to use these models for text sentiment analysis; Section 4 summarizes the deep learning methods applied to text sentiment analysis, elaborates on the advantages and disadvantages of current deep learning methods in text sentiment analysis, and introduces possible future research directions and development trends.

Table 1. English Abbreviations Comparison Table

2. Text Sentiment Analysis

2.1 Problem Definition and Classification

The goal of text sentiment analysis is usually to mine the views and emotions expressed in the text. It can be divided into topic-related sentiment analysis and topic-independent sentiment analysis. Topic-related sentiment analysis refers to extracting relevant topics from the text in addition to obtaining the sentiment polarity of the text, focusing on what kind of views and evaluations are held about a certain event or item. Topic-related sentiment analysis is also called attribute-based sentiment analysis. Liu B et al. [48][99] gave a good problem definition and method summary for attribute extraction and topic-related sentiment analysis. Topic-independent sentiment analysis simply judges the sentiment polarity of a document or sentence without considering the topic or attribute to which the sentiment is directed [94]. At present, most methods are for topic-independent text sentiment analysis. The subsequent method review and deep learning method introduction in this paper are all for topic-independent text sentiment analysis.

In addition to classifying based on relevance to the topic, there are many other classification methods for text sentiment analysis. From the perspective of the granularity of sentiment classification, at the coarse level, it can be divided into text subjectivity judgment and text orientation judgment. Subjectivity focuses on judging whether the text contains subjective emotions, while orientation focuses on analyzing the positive or negative emotions contained in the text. At a finer level, text sentiment analysis can classify the emotions contained in the text into subtle emotion categories, such as the seven basic human emotions—anger, disgust, fear, happiness, like, sadness, and surprise. In Plutchik's sentiment model [68], human emotions are further classified into eight basic emotions and eight complex emotions, as shown in Figure 1. From the perspective of the granularity of corpus processing, text sentiment analysis can be divided into the text level, sentence level, and word level, that is, the emotions contained in a document, a sentence, and a word. From a machine learning perspective, text sentiment analysis is a binary or multi-class classification problem; therefore, it is often referred to as sentiment classification. Figure 2 summarizes the relevant concepts and classification systems of text sentiment analysis, with bolded text indicating more research and applications.

The application scenarios for text sentiment analysis generally involve extracting and analyzing the sentiment contained in news articles, blogs, online forums, microblogs, and product (various products and services) reviews on e-commerce websites. This provides foundational data for applications such as public opinion supervision, tracking current events, news recommendations, and product evaluations. Compared to other text data, microblogs and product reviews often have a large volume of text, high real-time availability, rich information content, and greater potential value. Furthermore, due to the inherent characteristics of microblogs and product reviews—such as shorter text length, non-standard language, and a large number of emerging and popular terms—sentiment analysis of these texts presents greater academic and technical challenges. Therefore, research on sentiment analysis methods for these two types of texts has greater academic and practical value.

2.2 Typical Methods of Text Sentiment Analysis

For text sentiment analysis, the goal is to transform unstructured target sentiment text into structured text that is easily recognized and processed by computers. This requires identifying and judging meaningful information units to obtain information about the evaluators and their opinions. Common methods for obtaining evaluator opinion information can be divided into dictionary-based and rule-based methods, general machine learning-based methods, and deep learning-based methods. The first two are based on the construction of a sentiment dictionary, and the quality of the sentiment dictionary directly determines the subsequent sentiment judgment.

The dictionary- and rule-based approach generally uses existing knowledge resources, such as WordNet, to construct a sentiment dictionary, and then uses the sentiment dictionary to construct rules for sentiment judgment [43][96][44]. A simple rule can be constructed as follows: count the number of positive and negative sentiment words in the text, and judge the sentiment polarity according to the rules in Table 2.

Table 2. Emotional Polarity Judgment Based on Simple Rules

This method treats emotional words as having the same emotional intensity. If an emotional dictionary containing emotional words of different intensities is constructed, such as five levels for positive and negative emotions, that is, the emotional intensity of each word is distributed between [-5, 5], the intensity values of the emotional words contained in the text can be added together, and the emotional tendency of the text can be judged based on the overall positive or negative emotional intensity.

The machine learning-based approach was first proposed by Pang et al. [63]. In their method, a sentiment dictionary was used to construct the feature representation of the text, and then Naive Bayes (NB), Support Vector Machine (SVM), and Maximum Entropy (ME) models were used to classify the sentiment as positive or negative. After Pang, many people began to try to use machine learning methods for text sentiment analysis and proposed many methods [4][19][22][95]. The machine learning-based approach regards text sentiment analysis as a supervised or semi-supervised classification problem. The classifiers are generally SVM, NB, and ME. The main work is on how to construct and learn more representative features.

For machine learning methods, a major challenge is obtaining training data. Training samples can be obtained through manual annotation, but this method is labor-intensive and cannot obtain a large amount of labeled data. For text such as Weibo posts and comments, emoticons in the text can be used to annotate the text [1][62][100]. This annotation method introduces some noise, but it can easily obtain a large amount of training data and still achieve good results.

Deep learning-based text sentiment analysis methods mainly refer to text modeling, feature extraction, and sentiment classification based on constructed deep network models. Deep neural networks have strong data feature representation capabilities. Due to the presence of multiple nonlinear hidden layers, a multi-layered neural network can learn data features with almost arbitrary distributions [6]. The biggest advantage of deep networks is that they can automatically learn multi-layered feature representations. These feature representations are learned layer by layer and are closer to semantics at higher levels. Automatic feature learning can save a lot of manpower-intensive feature extraction work and can obtain feature representations with a wide range of applications. According to the different network structures, we can divide the deep networks currently used in text sentiment analysis into feedforward neural networks (FNNs), recursive neural networks (RecursiveNNs), convolutional neural networks (CNNs), and recurrent neural networks (RecurrentNNs). In Chapter 3 of this paper, the model structures of these networks and their application methods in text sentiment analysis will be introduced in detail.

2.3 Relationship between Text Representation, Deep Networks, and Sentiment Analysis

Text sentiment analysis can generally be transformed into a binary or multi-class classification problem. Similar to other machine learning problems, text sentiment analysis typically involves the following steps: text representation, feature extraction, and classification model selection. A key feature of deep learning is its ability to learn and mimic human cognitive habits, closely linking feature extraction and classification modeling. Feature selection can be automatically obtained through network parameter learning. A typical processing framework is shown in Figure 3.

Text representation and feature extraction are key steps in text sentiment analysis. Text representation involves representing the abstract text symbols to be processed in a form that computers can "understand," and common methods are detailed in Section 2.

In traditional machine learning algorithms, feature extraction is primarily performed manually. This can be intuitively explained from two perspectives: For computers, all data is essentially a binary string of 0s and 1s. Machines struggle to understand the abstract meaning expressed by data, while humans can interpret data and assign it concrete meaning. Furthermore, feature extraction can be viewed as a gradual transformation of the input data. For example, a string of numbers "20151010" might easily be perceived by a human as a date and time stamp, but for a machine, it's simply a sequence of numbers or a large integer. Mapping "20151010" to "20151010" is a non-linear transformation requiring prior knowledge. It's difficult for machines to automatically learn this non-linear mapping function, whereas humans, when examining data, automatically perform complex non-linear transformations based on years of accumulated prior knowledge. However, manual feature extraction has many limitations: it essentially involves searching for a reasonable non-linear mapping method based on prior knowledge, and human search efficiency is relatively low; different problems require different reasonable feature representations, necessitating extensive manual feature extraction for each specific problem, resulting in low efficiency.

It is difficult to make machines imitate human feature extraction methods, but it can be done from the machine's own way—by searching in a transformation function space represented by a reasonable model with a large amount of data and computation, as long as the features obtained by the latter are effective. We know that a deep neural network with more than three layers can represent any data distribution and transformation [6]. By searching in the transformation function space determined by the deep network model (each different set of parameters of the network represents a transformation function) (the search process is actually the training process of the network parameters), effective feature transformation functions can be found. Currently, several deep neural networks used in text sentiment analysis can learn effective feature representations without overfitting. Although the learned feature representation may only be a local optimum, allowing the machine to automatically learn effective feature representations still makes the related problems better solved.

Text representation and deep learning each have many different methods and models, making it difficult to say which text representation method is optimal or which deep network is best. For deep learning methods using deep networks for text sentiment analysis, different deep networks and text representation methods are chosen depending on the target problem being solved. Furthermore, the text representation method is closely related to the structural characteristics of the deep network: For FNNs, Bag-of-Words (BOW) and Vector Space Models (VSM) are generally used for initial text representation at the document level; RecursiveNNs and CNNs typically break down text into individual words, using low-dimensional ensemble vectors of these words for initial representation, and initializing with word vectors trained using other methods. RecursiveNNs then organize the words according to the lexical hierarchy of the sentence, while CNNs tend to represent sentences as matrices composed of word vectors for convolutional operations; RecurrentNNs generally treat text as a sequence of words, representing each word as a vector, and then learn the vector representations of both words and text.

2.4 Challenges of Text Sentiment Analysis

As we all know, text sentiment analysis ultimately needs to return to the basic syntax and grammar of language, as well as human cognitive issues. Before computers can understand text sentiment, we need a sufficient number of samples to train our machines. This requires a large enough corpus, especially labeled samples, for different languages. Although text sentiment analysis has been studied for over a decade, many challenges remain.

First, there is a shortage of high-quality, labeled sentiment text corpora. Traditional dictionary- and rule-based methods rely on well-developed sentiment dictionaries. WordNet effectively mitigates this problem, but it's still far from sufficient. Furthermore, authoritative sentiment dictionaries are even scarcer in Chinese applications, posing a significant obstacle to dictionary-based sentiment assessment. For statistical machine learning methods, success depends even more on high-quality, labeled sentiment text corpora; statistical learning methods lacking sufficient data scale are meaningless. Deep learning models often have tens of thousands of parameters, the selection of which depends on a sample size hundreds of times larger. Therefore, the larger the corpus, the less likely the trained model is to overfit, which is more beneficial for model learning. The difficulty in quantifying sentiment also makes obtaining labeled sentiment databases even more challenging.

II. The Impact of the Diversification of Textual Sentiment Expression (Emojis, Punctuation Marks). Sentiment analysis of text typically includes the analysis of long texts such as news reports, as well as the increasingly prevalent short text sentiment analysis in social network communications. In the latter, emoticons play an increasingly important role. In online communication or product reviews, these emoticons, composed of characters, graphics, and text, mimic human eye movements and emotional facial expressions. Humans can quickly decode these emoticons, exclamations of approval, or ellipses into an assessment of the sender's emotional state. However, for machines, understanding the expression of human emotion by integrating these emoticons and textual content presents a significant challenge.

Third, understanding the sequential nature of language structure. We also know that the bag-of-words (BOW) model for text representation is often criticized by researchers because it ignores the sequential nature of words in a sentence. Recurrent Neural Networks (RNNs), with their superior ability to process sequential data, can mitigate this problem. RNNs and Long Short-Term Memory (LSTM) models have recently achieved state-of-the-art performance in text sentiment analysis tasks. Nevertheless, the reasonable representation and understanding of the grammatical and syntactic levels of language itself remains a significant challenge for natural language processing.

IV. Parameter Tuning and Optimization of Deep Learning Models. Deep learning models are constrained by the scarcity of labeled sentiment corpora, and their inherently large number of parameters, while allowing them to simulate any function, also presents challenges for parameter tuning. To achieve good generalization ability, we desire as much data as possible, but training on datasets often measured in terabytes (TB) is a significant hurdle.

Nevertheless, thanks to the groundbreaking progress of deep learning in various research directions of natural language processing, deep learning has made full use of its expressive capabilities at the character, word, sentence, and even paragraph levels, achieving remarkable results in the field of text sentiment analysis and attracting more researchers to devote themselves to it.

3. Text Representation

For text sentiment analysis, the first consideration is text representation. Text, as a highly abstract form of information, needs to be represented in a way that computers can understand before it can be further processed. Based on the granularity of text representation methods, we categorize them into document, word, and character-level methods. To clarify the problem, the following explanations are provided: A document is not just an article in the ordinary sense; depending on the granularity of the corpus being studied, articles, paragraphs, sentences, and queries can all be considered documents. A word is not just a single word; every term in the dictionary set generated from the document set is considered a word, which can be a single word or an n-gram phrase. A character is a character that frequently appears in text in the ASCII character set. For non-character languages such as Chinese and Japanese, it is converted into a character representation (e.g., Chinese Pinyin).

3.1 Document Level

The most commonly used feature representation methods for document-level text representation are the Bag-of-Words (BOW) model [41] and the Vector Space Model (VSM) [74]. Both of these methods represent documents as sparse vectors of the same size as the document set dictionary, but they differ in how each dimension value is calculated. The BOW model treats a document as a bag containing words. If a word is in this bag, then the value of the corresponding dimension of the word in the vector is the frequency of the word, and the other dimensions are 0. The VSM treats a document as a vector in space. The value of each dimension of the vector is related to the distribution of the corresponding word in the document set, and is generally calculated using the TF-TDF method.

For the text sentiment classification problem, many of the machine learning algorithms mentioned in 2.2 use VSM to represent the features of the text, then train the classifier, and then perform sentiment classification. Before performing VSM feature learning, the establishment of the document set dictionary is a key step, which can generally be handled as follows: first, obtain all the terms in the document set, then filter them according to certain rules (such as filtering low-frequency words and high-frequency words, merging synonyms), and then combine them with the existing sentiment dictionary (such as WordNet) to obtain the final document set dictionary. Regarding terms, the unigram model is generally used, and only words in the known dictionary (such as the Oxford Dictionary) are taken as terms. Sometimes the N-gram model [15] is also used to expand the terms.

BOW and VSM can conveniently represent text and extract text features. They have good effects in information retrieval, document classification and text sentiment analysis. However, these two representation methods also have some drawbacks: they ignore the order of words in the document and lose contextual information; they cannot obtain the part-of-speech and semantic information of words; they have large dimensions, strong data sparsity and high computational complexity. When the data scale is large and the problem to be processed is complex (such as sentiment analysis of short text), the performance of the system using these two representation methods will become very poor in terms of accuracy and time complexity. Due to these problems, BOW and VSM are often used for the initial representation of text, and then other methods are used for further processing, such as latent semantic analysis (LSA) based on singular value decomposition (SVD) [17][54].

3.2 Word Level

Word-level text representation generally uses continuous word vectors to represent each word in the document set dictionary. These are low-dimensional continuous value vectors. Continuous word vectors have strong text feature expression capabilities and can obtain part-of-speech and semantic information of words. There are three main text sentiment analysis methods based on continuous word vector representation: one is to learn continuous word vectors by combining them with the research problem, constructing a model and optimizing the problem, and then obtaining the feature representation of the text [56][89]; another is to use continuous word vectors to cluster or expand sentiment words, optimize the original sentiment dictionary, and then use dictionary-based and machine learning methods to perform text sentiment analysis [87][93][98]; and the third is to use word vector representation as the initial input of the model, use deep neural networks to further extract features, classify text sentiment, and combine continuous word vector representation with many deep learning methods mentioned later.

The most widely used continuous word vector training method is word2vec proposed by Mikolov et al. [57]. This method maps each word in the document set dictionary to a unique low-dimensional vector. The cosine distance between different words in this vector space can represent the semantic and grammatical relationship between words [24][61]. For example, given the word vector representations of “king”, “man”, “queen”, and “woman”, the following relationship can be obtained: ||“queen”-“woman”||=||“king”-“man”||.

word2vec includes two models: the continuous bag-of-word (CBOW) model and the continuous Skip-gram model, as shown in Figure 4. The CBOW model predicts the distribution of the current word through the context, while the Skip-gram model aims to predict the distribution of the context words through the current word. The CBOW model is a simplification of the Neural Network Language Model (NNLM) [7], replacing the multi-node hidden layer in the three-layer neural network used in NNLM with a perceptron [71], and adopting many processing methods in NNLM: in the input layer, each word is mapped to a word vector, and then the vectors of the input words are concatenated as the input of the network model; the word vectors are randomly initialized as parameters of the model and continuously optimized as the model is trained; based on statistical language models, the conditional probability distribution of the current word can be estimated based on the preceding words and depends only on a few adjacent words, that is:

In the CBOW model, words following the current word are also considered; for each word, their weights are shared between the input layer and the hidden layer. Skip-gram also adopts a similar CBOW approach, but constructs a different model objective, hoping to predict the distribution of words in the context of the current word. Since the general document set dictionary V is very large, it will make the training computation complexity of the above models very high [9][55][57]. The implementation of word2vec uses two alternative methods to accelerate training: importance sampling proposed by Bengio et al. [9] and hierarchical softmax proposed by Hinton et al. [55].

Figure 4 shows the CBOW and Skip-gram models used in word2vec.

There are many other word vector learning methods based on statistical language models, similar to NNLM and word2vec. The basic idea is to first map words to word vectors (parameters to be learned), use word vectors as input to the model, establish an objective function based on the statistical language model, and train the model. The training methods are all stochastic gradient (SGD) and back-propagation (BP) [38] algorithms. The main difference is the network structure used, such as word vector learning methods based on RecurrentNNs [58][97] and LSTM (Long-short-term Memory) [26][75]. In addition, there are some methods for learning continuous word vectors, such as GloVe [65] which takes into account both global and local information of the document, and word vector learning for sentiment analysis [56][89].

Compared with VSM and BOW models, word vectors have stronger feature representation capabilities, can obtain semantic and syntactic information of words, and can represent this information based on the position distribution of words in the vector space. Word vectors are a type of low-dimensional non-sparse vector. When the vector representation of all words in the entire document set dictionary is obtained, the vector representation of the word set (such as phrases and sentences) can be easily obtained [60], thus obtaining low-dimensional non-sparse text features.

When using word vector representation, we also need to be aware of some of its limitations: after training, each word is mapped to only one vector, which mainly expresses the most commonly used semantic and grammatical meaning of the word in the document set. However, some words have different semantic and grammatical meanings in different contexts, meaning that the polysemy of words is difficult to represent. For training word vector models, especially when using the word2vec method, a large amount of training text in the same domain is required. For example, if our model ultimately aims to perform sentiment analysis on Weibo, then when learning word vectors, we need to prepare a large amount of Weibo text and perform some preprocessing, such as filtering emojis, @ tags, # tags, URLs, etc., to make the training text cleaner.

A Review of Deep Learning Methods for Text Sentiment Analysis (Part 1)

Read next

CATDOLL 136CM Miho (Customer Photos)

CATDOLL 123CM Olivia (TPE Body with Soft Silicone Head)

CATDOLL 102CM B04 TPE Doll with Anime Head

CATDOLL Katya Hard Silicone Head