Author ORCID Identifier

https://orcid.org/0009-0009-7464-2640

Semester

Spring

Date of Graduation

2025

Document Type

Dissertation

Degree Type

PhD

College

Statler College of Engineering and Mineral Resources

Department

Chemical and Biomedical Engineering

Committee Chair

Debangsu Bhattacharyya

Committee Co-Chair

Yuhe Tian

Committee Member

Yuhe Tian

Committee Member

Xin Li

Committee Member

Srinivas Palanki

Committee Member

Fernando V. Lima

Abstract

Within the last decade in the field of process control, there has developed a distinct gap between the technological advancements of today and the well-established theory that has preceded it. This is most evident where the novel developments and structures often outpace the necessary guarantees and strong foundation required to ensure the possibility of widespread adoption. Therefore, the goals of this work are to broadly integrate those novel advanced control methods into the field of process control in such a way to increase the applicability and encourage implementation.

The first portion of this work seeks to further explore the nature of switched systems, which often create unconventional design criteria where standard approaches are limited in their approach. Most work applied creates the need for either multi-objective optimization or the introduction of discrete variables, with either method requiring manipulation and tuning of the problem. There is a defined gap in the literature for creating a method of continuous transition between operational regions with disparate objectives for control. Therefore, the the first portion of this work examines estimation-based model predictive control (E-MPC) to create an algorithm for objective prioritization such that distinct objectives may be defined for mutually exclusive operational regions. The objective prioritization algorithm is built by using logical conditions that define regions of operation which are incorporated into the objective function, thus allowing smooth transition between a single objective or bank of objectives. The control objective prioritization is cast in the framework of a model predictive controller that is coupled with an extended Kalman filter for estimation of critical yet unmeasured state and performance variables. This applied to the superheater-reheater system of a natural gas combined cycle (NGCC) power plant.

For the remainder of this work, the focus will be on the gap separating the continued advancements of reinforcement learning (RL) and the field of process control. Specifically, it is desired to streamline implementation of RL into control applications for complex dynamic systems in such a way that performance and safety criteria are met or exceeded. Within the last decade, advanced computational techniques have allowed the use of RL in conjunction with continuous systems, but developments have failed to accommodate the often stringent learning and performance requirements necessary for control of a plant environment. This work seeks to bridge that gap, establishing RL as a viable control method while also ensuring the safety and performance expected of any conventional process control.

An approach considered in this work is to make use of direct RL that can work in parallel with a conventional mode of process control which may already be in place. In this way, the RL may learn and be supported by the existing mode of control online while gradually assuming control of the system. In the event of a degradation of performance, the guarantees of safe operation by the existing control (under the assumption that the existing control does guarantee safety) are still in place. This approach has little been explored in open literature, and this gap is what this work seeks to fill. The parallel implementation of RL alongside more conventional process control (CPC) allows for the RL algorithm to learn from CPC. The past performance of both methods are assessed on a continuous basis, allowing for a transition from CPC to RL and, if needed, transitioning back to CPC from RL. This allows the RL algorithm to slowly and safely assume control of the process without significant degradation in control performance. It is shown that the RL can derive a near optimal policy even when coupled with a suboptimal CPC. It is also demonstrated that the coupled RL-CPC algorithm learns at a faster rate than traditional RL methods of exploration while the algorithm's performance does not deteriorate below CPC, even when exposed to an unknown operating condition. This is applied to a benchmark nonlinear continuously stirred tank reactor (CSTR) as well as a flowsheet of a solid oxide fuel cell (SOFC).

A differing approach is then presented investigating the integration of RL with existing model predictive control (MPC) in order to provide a constrained policy for the RL while also creating an adaptable objective for the MPC. RL and MPC possess an inherent synergy in the manner in which they function, and selection of MPC for combination with RL is not arbitrary. Two specific aspects of MPC are advantageous for such a combination- the use of a value function and the use of a model. The use of a model in MPC is also useful since, by solving for the optimal trajectory, a projected view of the expected reward is gained. While this information can be inaccurate based on the current value function, it can allow for accelerated learning. By combining this with a correction for state transitions, an MPC formulation is derived that obeys the constraints set forth, but can adapt to changing dynamics and correct for plant-model mismatch without a required discrete update, an advantage over standard MPC formulations. We propose two algorithms for the value-function model predictive controller (VFMPC)- one denoted as VFMPC(0) where the one step return is utilized to learn the cost function, and the other denoted as VFMPC(n) where the optimal trajectory is used to learn the n-step return subject to the dynamics of the process model. An artificial neural network (ANN) model is introduced into VFMPC(n) to improve the controller performance under slowly changing dynamics and plant-model mismatch. The developed algorithms are applied to two applications, a double integrator and a selective catalytic reduction (SCR) unit.

Finally, the issue of sample inefficiency in the training of RL is approached by leveraging unsupervised machine learning (ML). This problem is two-fold in that both the quality of data as well as the selection for training is often difficult to determine quantitatively. The goal of this work is to leverage unsupervised learning tools, such as gaussian mixture models to help contribute to both the exploration policy and the training/learning done by the algorithm. Because of the availability of data over the course of the training period, it is desired to use such ML algorithms to evaluate the data in order to both form a prediction for the current state of the system, as well as sort data for quality in learning. Such a method would allow the screening of potentially risky actions, without the need for an internal prediction model for the RL algorithm, while also accelerating learning.

Share

COinS