There are two types of DCS crashes: HMI (Human Machine Interface) crashes and controller crashes. The former results in less damage than the latter. Controller crashes are caused by too many I/O cards or insufficient memory capacity. Once the controller program reaches a certain point, it stops there and cannot proceed further; even a restart will not help. HMI (mainly operator station) crashes occur to varying degrees in all DCS systems, only the frequency differs. It is also related to the length of time the DCS has been used. Hardware-related crashes result in unrecoverable displays. For example, low power supply voltage at the operator station can easily cause crashes. Crashes caused by inappropriate software or communication connection speeds can recover after a period of time. Poor compatibility between the operator station operating system and monitoring software can also cause crashes, which maintenance personnel cannot address; this is called a primary crash. Crashes caused by unreasonable configurations leading to network congestion or insufficient memory can recover on their own after a few minutes. I. DCS HMI Crashes Recently, because the MIS system needs to read real-time production data from the DCS... When connecting to a dynamic data server in a network, some use the Direct Data Entry (DDE) method. When reading large amounts of data (e.g., 3000 points), network congestion becomes frequent due to the high resource consumption of DDE, leading to severe crashes of various HMI (Human Machine Interface) nodes. The severity of network congestion is related to several factors. When retrieving data from the operator station using DDE, the problem is not too significant if the data volume is below 500 points. However, if retrieving several thousand data points, operator station crashes are almost inevitable. Connecting nodes to the DCS (Distributed Control System) communication network and retrieving data from the network interface is better. The controller sends data to the interface, and the HMI must adhere to network communication protocols when reading data from the network interface. There are two physical network structures: ring and bus. Bus networks are logically ring-shaped. Star networks are only used for small systems (within 100 I/O points). The commonly used communication protocol is broadcast; nodes on the network continuously broadcast data as soon as it is sent, and nodes that need the data receive it. Besides this method, broadcast protocol networks also have another approach: a node queries other nodes on the network for data. If other nodes don't have the data, it repeatedly queries until it retrieves it. If the data is completely absent from the network, it will cause network congestion. To familiarize operators with the DCS workstation, a simulation system can be used to understand the DCS keyboard and reduce crashes caused by misoperation; it can also help understand workstation crash scenarios. When the DCS has been running for a long time, configurations are only added and not removed, meaning some configurations are no longer actually connected to actual I/O points and are irrelevant to control. Such I/O points exist in the user application on the engineering station. When a dynamic data server connects, attempting to read all data points from the DCS can cause network congestion due to a large number of invalid data points, leading to a crash in the HMI. In this case, the configuration in the controller can be read from the reverse engineering station, compared with the contents of the forward engineering station, and invalid points can be deleted to avoid network congestion. Another important point to note is that when connecting to a dynamic data server, the software versions of all interfaces should be checked to ensure consistency; otherwise, data transmission will be affected. The third method to resolve network congestion is to increase the exception reporting area if using exception reporting to reduce network traffic. Exception reporting means that a point on-site only sends data to the network when a change occurs. To prevent situations where a point is faulty but the HMI is unaware of it, a report should be sent even if the point does not change after a certain period. To reduce network traffic, appropriately increasing both parameters of exception reporting can also reduce data volume. Recently launched general-purpose operator stations mostly use the Windows NT/2000 operating system, and the monitoring software is also universal, such as FIX and InTouch. Due to the large sales volume of the software, there are fewer problems, and the general-purpose operator station has good openness, greatly reducing crashes. However, poor driver software can also cause crashes. Importantly, its maintenance costs and spare parts procurement are not limited by the DCS manufacturer. However, a firewall is essential to prevent hacker and virus intrusion. 2. Crashing Issues in C/S Structure Human-Machine Interfaces: 1. C/S Structure DCS Human-Machine Interfaces: After adopting ordinary PCs and Windows operating systems, C/S structures were used to increase the number of human-machine interfaces. As long as the controller is connected to the server via an interface, and the server and clients are connected by cables using network interface adapters, it becomes a C/S structure. Clients share server resources. The server is usually installed in a secure location, storing the most valuable field production data sent by the DCS controller. The computer acting as the server may serve several or even a dozen clients simultaneously, so the server needs a faster processor, more memory, and more storage space than the clients. Clients are standard PCs running Windows operating systems, communicating only with the server and not with other clients, and have their own software packages. When acting as DCS workstations, the server and clients can run the same monitoring software to share server resources. The difference in the software installed is that the server should have DCS controller driver software, while the clients do not. Another scenario is that an OPC server is installed on the server monitoring software, and an OPC client is installed on the client monitoring software, allowing the clients to access the server. If graphical observation of field equipment operation is not required on the server, monitoring software can be installed on the server, and only an OPC server is needed. A monitor is only required when troubleshooting. Using a C/S architecture saves on expensive dedicated network interfaces in a DCS system. For reliable system operation, dual-server redundancy is used, which is dual-machine dual-network. A system uses a maximum of two dedicated interfaces. End users have reported that a multi-master station structure with multiple interfaces is better because improper arrangement of servers and clients can easily lead to crashes. For example, INFI90's Conductor NT and Honeywell's GUS are prone to crashing in older versions. 2. Causes of C/S Architecture Crashes The two hardware components of a C/S architecture network are usually called client PCs and servers. Client PCs are located in the central control room, while servers are placed in a secure location. The causes of DCS operator station crashes are complex. This article focuses on analyzing crashes caused by the C/S architecture, with two scenarios: The connection between the controller, server, and client can have two scenarios: First, the DCS controller has an Ethernet interface, and the three are connected using an Ethernet switch. The switch has multiple ports, and each port can have different data transmission rates. The number of ports is determined by the number of computers connected. The main specifications of a switch are backplane width and memory size. Ethernet cables such as 10BASE and 100BASE can transmit at various speeds; however, Category 5 cables can only support 10BASE. In Figure 2, the Ethernet connection is a star topology. Distributed cables are used to connect each computer to a central connection point, often called a network hub. Each computer uses an independent cable; a connection failure only affects the affected computer, while other computers continue to operate. If all machine adapters have the same speed, a star Ethernet connection typically uses 10BaseT cables. The controller sends information to the server, and clients read and write data from the server. Due to the large number of clients, using the same cabling among them could lead to system crashes. The data read by each client from the server can be the same or different. Alternatively, all clients can display the same content, with multiple operator stations operating redundantly. If each operator station exchanges a lot of data with the server, a 100BASE port can be used; if the server and controller exchange less data, a 10BASE port should be used; and if clients exchange very little data, a 10BASE port can also be used. If this isn't done, client machine crashes are highly likely. Inappropriate client PC and server configurations: Client PC configurations depend on the operating system to be used. For example, DOS and Windows 3.1 only require 8MB of RAM, while Windows 9x requires at least 16-32MB, and Windows NT requires at least 32MB, ideally 64MB, especially Windows 2000, which requires 64MB of RAM. Besides RAM, for Windows 2000, a faster processor and a larger hard drive are necessary. This article is from http://www.jdzyjs.com. When selecting client PCs, although they can be slightly inferior to the server machines, the RAM must be higher than the above requirements. This is because the machines, in addition to running the operating system, also need to read and write data with the server. The server's CPU and hard drive handle all service requests from clients on the network. Servers require a large amount of storage, much more than the client's storage capacity. It's best to know the number of storage slots on the motherboard and the maximum supported RAM. Check the configuration of the storage supplied with the DCS server. A single 64MB DIMM machine is easier to upgrade than a machine with four 16MB modules filling all four slots. Servers are best equipped with ECC-enabled memory. When used with an ECC-enabled motherboard chipset, ECC memory can correct single-bit memory errors and detect multi-bit memory errors. Hard drives are also crucial for servers, as networked computers share server files. Hard drives should be durable, reliable, and suitable for serving multiple users simultaneously. Therefore, SCSI hard drives are more suitable. SCSI drives are intelligent and have higher rotational speeds, offering extremely high data transfer rates from drive to system when using UltraWideSCSI. Both IDE and SCSI hard drives can utilize RAID technology for more secure data storage and improved server quality. A SCSI bus can connect multiple hard drives; using one 9GB hard drive is less efficient than nine 1GB SCSI hard drives. Disk arrays, designed specifically for servers, are mass storage products. Hard drive arrays can accommodate a large number of drives, providing fault tolerance through disk mirroring or RAID, automatically storing multiple copies of server data across different hard drives. When a drive fails, all data remains available to the user. Some array drives are even hot-swappable, allowing faulty drives to be replaced while the machine is running. The best RAID version for servers is RAID 5, which is supported by all current versions of client/server network operating systems. RAID 5 stripes data across multiple SCSI drives, allowing the contents of a single failed drive to be reconstructed from information on other drives in the array. When combined with a server featuring hot-swappable drives and power, RAID 5 enables near-continuous 24/7 operation. If the DCS is not a client/server architecture, where each workstation operates independently and only stores a portion of the data, and if redundancy is already high enough for reliability, RAID technology may not be necessary. For DCS workstations used for file backup, a SCSI tape drive can be installed on the workstation. For client/server architectures, it's best to install a tape drive on the server to copy user-configured application software. In case of workstation or server failure, the application can be re-entered. Tape drive products are also updated rapidly; tape drives or tapes for DCS imported around 1990 are no longer available as spare parts or tapes are not readily available, and new and old tape drives and tapes are incompatible. The S9000 system controller consists of two parts: a 3C905 card for analog control and an LPM620-0072 PLC host with an Ethernet port. When configuring the system, it uses Ethernet for connection, forming a client/server (C/S) architecture. There are two methods for Ethernet connection: a hub connection is used when there are many client machines. If only 1-2 operator stations and 1-2 controllers are used, all operator stations and controllers can be connected to a common cable, and the number of network-connected devices is small; T-connectors can also be used. Since the server contains more drives than the client machines, its power supply capacity and reliability must be considered. Power supply is easily overlooked, such as the stability of the DC output, noise, stray signals, spikes, and surges. The power supply is one of the most prone to failure components. Electronic circuitry uses +3.3V or 5V, while hard drives and fans use +12V. Ethernet network adapters are available, as are other network adapters, but none are as widely used as Ethernet, hence their higher price. Using Ethernet can reduce DCS costs. Three System Examples Prone to Crashes Design a system with a client/server (C/S) architecture for the operator stations, as shown in the diagram. For example, configure 8 operator stations, 2 servers, and 4 S9000 controllers. The number of I/O points to be displayed on the operator stations should be 2000. The controllers and servers communicate via Ethernet. The physical connection between the controllers, servers, and controllers is through a 16-port switch hub. The default speed of the 16-port hub is 100BASE. If the adapters for the servers and operator stations are 10BASE, and both the server and client memory are 64kB, the system experiences severe crashes. Replacing one of the server's adapters with a 100BASE adapter, while keeping the client network adapters at 10BASE, and increasing the server memory to 256kB and the client memory to 128kB significantly improves the situation. In practice, the crashes of the server and operator stations are closely related to the server's memory capacity. 64kB of memory is the minimum; its capacity is related to the number of operator stations. Operational results indicate that adding an operator station server requires at least 10-30kB of additional memory. The server has two network adapters: one for communication with the controller (10BASE) and one for communication with the operator station (100BASE). The operator station can use a 10BASE adapter. Standard cables are used for cabling. If the server and operator station software are fixed (as was the original S9000 operator station using this software with Windows 3.2), system crashes were not severe. Currently, the S9000 system monitoring software is proprietary and may have some bugs. Combined with inadequate network and memory configurations, system crashes are very serious. One client machine should be configured as the master server, and the other as a slave server. Otherwise, if one server fails, the other will not function properly. Configuration is even more critical if there are three servers.