Share this

Research on Big Data-Driven Process Optimization Technology in the Chemical Industry

2026-04-06 04:14:41 · · #1

introduction

Distillation columns are among the most common and important separation equipment in chemical production processes. Investment in distillation column equipment in chemical plants is substantial, accounting for approximately 30% to 40% of the total equipment investment in a chemical project. Their performance directly affects the investment, capacity, quality, energy consumption, and cost of the production unit. The control of the distillation process has always been an important research topic in the field of control engineering.

Existing research on distillation columns suffers from drawbacks such as complex model construction, often incomplete mathematical models, poor control performance, delayed diagnosis, and inability to predict future trends. Therefore, research on energy-saving optimization control of distillation columns in the chemical industry is mostly in the experimental demonstration stage, with limited practical application in actual plants. In the development of automation in the chemical industry, process monitoring of distillation column equipment operation is relatively mature, having accumulated a large amount of industrial big data, including equipment IoT data, production and operation data, and external data. This enables the collection and processing of large-capacity industrial big data covering the entire distillation column production process and encompassing different scales across both temporal and spatial dimensions.

Industrial big data, as a resource, is widely regarded as a crucial driving force for the transformation of the chemical industry from "Made in China" to "Intelligent Manufacturing in China." Therefore, effectively building big data-driven models based on historically accumulated data from distillation columns in the chemical industry, and optimizing distillation column production based on effective modeling, can solve current industry challenges, contribute to improving product quality and reducing energy consumption, and is an inevitable requirement for the intelligent transformation and upgrading of the chemical industry.

Analysis of traditional distillation column modeling methods

Traditional modeling methods for distillation columns mainly include theoretical modeling methods, traditional system identification, and data-driven intelligent modeling methods. Theoretical modeling and traditional system identification are considered traditional modeling methods. Currently, much of the modeling in the chemical industry focuses on the first two methods. Theoretical modeling, also known as mechanistic analysis or "white box" methods, analyzes the inherent operational laws of a process, applying known principles, laws, and theorems to establish a practical process experience model through long-term practice. A typical mechanistic modeling technique uses algebraic equations, difference equations, and differential equations to describe linear or nonlinear, continuous or discrete, deterministic or stochastic continuous systems, thus establishing a system model.

Existing research on distillation columns mainly focuses on analyzing the actual operation of distillation columns and using process simulation software to propose numerous distillation column models. These models aim to achieve a balance between maximizing product quality, maximizing output, and minimizing energy consumption for producing equal quantities of product under current conditions, and to predict abnormal conditions. Many of these studies are based on precise mathematical models.

Optimal controllers are designed using the Linear Quadratic Regulator (LQR) method based on an accurate fifth-order model to analyze distillation columns. However, this method neglects the situation where model parameters change. Some industry experts have proposed using theoretically designed controllers to analyze and study the performance of distillation column systems. However, this control method is designed for control processes with model parameter perturbations within a small range, and its design philosophy can lead to unnecessary conservatism in the design process, thus ignoring the situation where model parameters change over a large range. These studies typically rely on extensive expert knowledge to accurately model a specific distillation column, resulting in long modeling cycles and high difficulty. Furthermore, because the model is predetermined, a fixed model struggles to cope with dynamically changing control requirements when the system configuration changes.

Theoretical modeling methods are based on theoretical analysis and typically require a deep understanding of the internal workings of the research object. However, the internal structures of most systems are quite complex, making it difficult to fully summarize their inherent laws from a mechanistic perspective. Furthermore, due to the complexity of model construction, the resulting mathematical models are often imperfect, exhibiting shortcomings such as diagnostic lag and the inability to predict in advance.

Therefore, most research on energy-saving optimization control of distillation towers in the chemical industry is still in the experimental demonstration stage, with limited practical application in the equipment. The main reasons for this are the uncertainty of the model, the constraints of the control and controlled variables, the unreliability of the control algorithm implementation, and the complexity of implementing the optimization control strategy when the actuator fails.

Research on Distillation Column Model Construction Technology Based on Big Data

In recent years, with the development of industrial automation and DCS technologies, distillation column equipment has initially achieved production automation and collected a large amount of data during operation. However, the in-depth utilization of this big data is insufficient, and there is a lack of methods and basis for modeling or evaluating the operation of distillation column equipment based on big data. This paper studies the model construction method, model testing, model evaluation, and system deployment method of distillation column systems based on big data-driven modeling.

The overall architecture of the model for offline learning and online prediction is shown in Figure 1.

Figure 1. Overall architecture diagram of the model's offline learning and online prediction.

Big Data-Driven Distillation Column Model Construction and Operation

For distillation column systems, data mining is performed on historical control system data collected over many years to analyze the inherent coupling relationships between up to 68 variables in order to identify patterns of change. These learned patterns are then applied to the control of the distillation column system to improve system efficiency and reduce power consumption while ensuring output quality, minimizing energy consumption, and meeting equipment constraints. The main approach is to model the historical operating data of the distillation column system based on time series analysis using deep learning, constructing a relationship model between various system parameters, and then using the model's output to predict the output.

Data preprocessing

Due to the large volume of data in the distillation column, many data points are missing or abnormal. Firstly, an anomaly identification method is employed, using a physical model to filter and reconstruct the abnormal data. Due to the limitations of physical characteristics and laws, there are certain coupling relationships between different states of the equipment. Anomalies in sensor values ​​typically do not simultaneously modify all values. Therefore, this characteristic can be used to identify abnormal data.

Ignoring outliers is extremely dangerous; including them in data analysis without proper handling can negatively impact model building. Pictograms of variables can be created, and the mean and median can be used to describe the central tendency of the data, reflecting the overall level of the data to some extent.

Eigenvalue selection and processing

The distillation column model is a time-delay model that constructs a feature set that integrates recent, medium-term, and long-term data.

Feature parameter selection methods based on sensitivity analysis determine the importance of feature parameters to the model output by examining their sensitivity to the model's output, thereby eliminating redundant feature parameters. Based on how the sensitivity coefficient is calculated, these methods can be divided into two categories: feature parameter selection methods based on statistical random sensitivity and feature parameter selection methods based on partial derivative sensitivity.

When there are many influencing factors, using the average impact value method to select some characteristic parameters as input for modeling can simplify the model. When there are few influencing factors or only some characteristic parameters are selected, the average impact value can be used to calculate the weight coefficients of each characteristic parameter and perform weighted summation to further improve the modeling accuracy.

Because distillation column data has different types: the features at the current moment are divided into three categories: the first category (recent) is the field value of the adjacent interval at this moment, with the interval size on the order of minutes. Within this interval, the system should be a smooth change process. For example, the changes in the previous few minutes may reflect similar characteristics at the current moment; the second category (near) is the time parameter corresponding to this moment at 1 hour intervals, until after 5 hours when it is no longer considered, and the value operation is repeated in total; the third category (distant) is the same data at a more distant time point. It was found that the data distribution may have a 1-day periodic homogeneous distribution. Therefore, with 1 day as the interval, the data at this moment after 1 day is taken as part of the feature content.

Model building

Due to the large number of input and output parameters, and the need for multivariate, multi-output prediction based on time series data, this is a relatively complex system analysis problem. Data analysis of this system requires considerable expertise. To mitigate this limitation and to learn some potential patterns among the variables, a deep learning approach is proposed, using a 5-layer neural network to obtain a system model for target concentration, water content, power consumption, and (un)controllable variables.

This uses the same 5-layer neural network model: one input layer, one output layer, and three hidden layers. The only difference between models 1 and 2 is the input layer. The number of neurons in the hidden layers is 3000, 2000, and 100 respectively. It's also necessary to choose an activation function for this neural network. Without an activation function, the output of each layer is a linear function of the input of the previous layer, resulting in a linear combination of inputs regardless of the number of layers, equivalent to having no hidden layers and degenerating into a primitive perceptron.

In terms of activation function selection, ReLU is superior to the commonly used tanh and sigmoid. Using the sigmoid function results in relatively high computational cost, and gradient vanishing is prone to occur during backpropagation in deep networks, making neural network training impossible. In contrast, ReLU can set the output of some neurons to 0, creating a sparse network structure, reducing the interdependence between parameters, and mitigating overfitting in the prediction process.

Meanwhile, due to the large number of neurons in deep learning neural network structures, the computation process is extremely lengthy, and optimization algorithms are generally used to optimize the learning process. This study uses the Nadam algorithm, which is an Adam algorithm with a Nesterov momentum term (RMSprop with momentum, dynamically adjusting the learning rate of each parameter using first and second moment estimates of the gradient). It imposes stronger constraints on the learning rate and more directly affects gradient updates.

Research on Distillation Column Process Operation Optimization Technology Based on Big Data-Driven Model

Considering that changes in the current state due to the current control strategy will affect the characteristics of subsequent time series data during industrial optimization control, it is necessary to update these influential feature variables based on the changed state variables. A model is constructed using an equipment model to establish the relationship between controllable variables (time, reboiler temperature, reboiler pressure, feed rate, top reflux, middle temperature, and reboiler output as input layer variables of a deep learning network) and standard concentration, moisture content, and power consumption (output layer variables of the deep learning network). This improves the model's reusability. Regarding the selection of control strategies, since there may be a large set of parameter outputs to satisfy the system's output, a strategy selection algorithm based on a genetic algorithm is proposed. By generating a large number of control strategies as the initial population of the genetic algorithm, the best individual is selected through mechanisms such as crossover, mutation, fitness evaluation, and optimal individual recording as the optimal strategy for the current moment. The changed controllable parameter values ​​are then updated to subsequent moments for feature construction. Model 3 generates the initial population corresponding to this moment, achieving the goal of time-series-based optimization control. A genetic algorithm encoding mechanism and a fitness function for the "optimal control" are proposed. Due to certain constraints, the fitness function is calculated in the objective function by using a penalty term [2].

The genetic algorithm mainly consists of four sub-processes: gene selection, gene crossover, gene mutation, and fitness evaluation. Genes represent solutions to the problem, and the expression of solutions follows a specific encoding method. Figure 2 shows the encoding forms of the solutions.

Figure 2 shows the encoding format of the solution in the genetic algorithm.

Real-time optimization of distillation column equipment:

The process operation is monitored in real time, and the operating point is continuously adjusted to overcome influencing factors while meeting all constraints, ensuring that the process always achieves optimal economic benefits. "Online" means that the entire optimization process is automated, from data acquisition and model correction to optimization calculation. The implementation of real-time optimization technology requires reliable measurement and transmission instruments, reliable conventional control systems, reliable advanced control technology, and reliable real-time optimization models and algorithms. The entire system is a highly integrated hardware and software architecture. Simply put, when the process is in a steady state, data from the distributed control system is harmonized, and the steady-state model is corrected based on this. Under certain constraints, the model is used for optimization calculations, and the optimal setpoint obtained is sent to the lower-level control system. When the process reaches a new steady-state operating point, the next round of data harmonization, model correction, and optimization begins, and this cycle repeats continuously.

Real-time optimization can effectively integrate the management and control layers, enabling comprehensive automation and optimization of the factory from top-level planning management to bottom-level equipment control, while achieving the following benefits:

(1) Increase output and improve product quality to keep production at its best operating condition.

(2) Reduce the consumption of raw materials and energy;

(3) Extend the operating cycle of the equipment;

(4) It responds promptly to changes in market supply and demand;

(5) A deeper understanding of the process technology and operation will help improve the process and adjust the operation strategy.

Real-time optimization of the distillation column unit is based on a distillation column optimization control model constructed with big data. The online real-time optimization consists of several steps, including data acquisition, data correction, steady-state verification, model parameter update, model optimization calculation, and control action output, as shown in Figure 3.

Figure 3. Online Real-Time Optimization System Structure

Data collection

This includes process quantities (temperature, pressure, flow rate, etc.) that are directly measured by field instruments and have been centralized in the DCS, as well as process quantities (components, etc.) that cannot be directly measured by instruments.

Data Correction

Because the collected measured data inevitably contains two types of errors: gross errors and random errors. This coarse data cannot be directly used for optimization calculations and must undergo a process of "refining and eliminating the false." First, screening is performed to identify which data contains gross errors. Gross errors are further divided into two categories: instrument-related errors (such as instrument malfunctions, sensitive component failures, etc.) and process random errors (such as leaks). This type of gross error data must be deleted and cannot participate in correction calculations; otherwise, it will propagate these large errors to other good measurement data. Second, random error correction is performed. Data with gross errors removed can be considered to contain random errors, but these errors usually follow a normal distribution. The least squares method can be used for data correction calculations, and the calculated values ​​can compensate for the deleted large error values. The processed data becomes refined data that can satisfy material balance and energy balance requirements.

Steady-state test

Current real-time optimization techniques are all based on steady-state simulation models, so they are only applicable when the process is operating in a steady state. The system entry point must first verify whether the process is operating in a steady state. Statistical analysis is performed on the measured values ​​of some key process variables of the distillation column unit. If the changes in these measured values ​​are below a set threshold, the process is considered statistically steady-state; otherwise, it is considered insufficient for real-time optimization, and the program enters a waiting loop, re-verifying the steady-state status at fixed intervals.

Update of steady-state model parameters

The process simulation stage yields a steady-state simulation model based on a rigorous mechanistic model. This steady-state simulation model requires several further processing steps to become a usable model for online real-time optimization.

Data calibration model. Evaluates the inherent consistency of DCS measurement data. It is a set of simultaneous (equation-based) heat and mass balance calculation procedures. The ultimate goal is to provide a set of measurement data that is completely consistent in heat and energy balance, which can then be fed into the accounting conditions.

The operating condition model is calculated. Using calibrated data, analytical data, and other manually input data as inputs, the material and energy balance of the entire process of the unit is calculated, thereby calculating the performance parameters of some unit equipment models that change with operating time.

Basic operating condition model. The variable parameters in this type of model have been calibrated to obtain current values ​​that reflect the current state of the equipment. This model is used to predict the equipment performance after changes in operating conditions.

Solution research. Based on the operating condition model, the specified inputs of product indicators are added, and various operating variables can be changed to conduct research on various solutions.

Operating condition study. Based on the accounting model, the variables in the operating condition study are changed to obtain the changing trends of the corresponding variables, thereby studying the relationship between operating variables and product indicators. For example, the relationship between the reflux ratio and output of a distillation column and the composition of the product output.

Optimize the model. Based on the accounting model, add decision variables and constraints, such as using the maximization of the output value of the distillation column unit as the objective function, and perform online optimization of the operating conditions of the distillation column unit.

Optimize calculation

Given an objective function (maximizing profit, maximizing output, or minimizing cost, etc.) and external market economic data (raw material costs, product prices, and unit prices of water, electricity, and steam, etc.), select an optimization algorithm and calculate the optimal operation plan using a computer.

Advanced process control

The optimal operating point obtained from real-time optimization calculations is used as the setpoint for advanced process control, enabling the distillation column unit to reach the optimal operating point via the optimal path.

In summary, this research analyzes the core requirements of big data-driven equipment modeling, and constructs an equipment model through a multi-process process including data feature analysis, feature engineering extraction, model selection, training and optimization, and evaluation. Based on this equipment model, it constructs equipment state analysis, state time series transition and evaluation, and realizes equipment process control optimization. This research has the following advantages:

(1) Big data-driven modeling and control optimization of distillation column equipment can be achieved without the need for an accurate mathematical model of the target system, and can effectively cope with a certain degree of unreal data. It is less affected by a few anomalies and has the ability to continuously improve and optimize.

(2) By applying big data technology to conduct in-depth analysis and mining of massive historical data of distillation tower equipment, we strive to quickly obtain valuable information and form equipment modeling and control optimization methods that can be promoted.

Ultimately, in actual projects, the following technical specifications can be achieved for distillation column equipment:

Modeling metrics:

• Temperature parameter measurement index modeling is accurate to 0.1℃;

• The modeling accuracy of pressure parameter measurement indicators reaches 0.1Kp;

• The modeling accuracy range for flow parameter measurement indicators reaches 0.2%;

• The model can accurately depict the content of each component down to 0.5%;

Control and optimization indicators:

• The fluctuation range of target extract content was reduced by 30%;

• Impurity content is reduced by 60% compared to industry standards;

• Moisture content is less than 0.05%;

• Increase the average product yield by 0.5%.

This research, based on big data-driven modeling, optimizes the process operation of distillation towers in the chemical industry. This will provide users with social benefits such as improved product quality, reduced energy consumption, and green development. It will also enable the chemical industry to achieve deeper applications of big data. By modeling key equipment using big data, the operating status of critical equipment can be more clearly understood, improving equipment operation, maintenance, and optimization levels. This will significantly enhance labor efficiency, automation, informatization, and intelligence in the chemical industry, achieving unmanned or minimally manned operation in some process stages. Equipment process operation optimization based on industrial cloud and industrial big data platforms not only enables equipment to have self-adjustment, self-optimization, and self-diagnosis capabilities, responding promptly to changes in production demands, but also allows for real-time interaction with other intelligent modules through industrial networks, recombining production processes and layouts to meet the chemical industry's needs for intelligent modeling, management, and optimization throughout the entire process.

Disclaimer: This article is provided by the company. If it involves copyright or confidentiality issues, please contact us promptly for deletion (QQ: 2737591964). We apologize for any inconvenience.

Read next

CATDOLL CATDOLL 115CM Nanako Silicone Doll

Height: 115 Silicone Weight: 22kg Shoulder Width: 29cm Bust/Waist/Hip: 57/53/64cm Oral Depth: N/A Vaginal Depth: 3-15cm...

Articles 2026-02-22