Semester

Summer

Date of Graduation

2022

Document Type

Dissertation

Degree Type

PhD

College

Statler College of Engineering and Mineral Resources

Department

Chemical and Biomedical Engineering

Committee Chair

Debangsu Bhattacharyya

Committee Co-Chair

Fernando V. Lima

Committee Member

Fernando V. Lima

Committee Member

Stephen E. Zitney

Committee Member

Benjamin Omell

Committee Member

Xin Li

Abstract

Reinforcement learning (RL) is a machine learning method that has recently seen significant research activity owing to its successes in the areas of robotics and gameplaying (Silver et al., 2017). However, significant challenges exist in the extension of these control methods to process control problems, where state and input signals are nearly always continuous and more stringent performance guarantees are required. The goal of this work is to explore ways that modern RL algorithms can be adapted to handle process control problems; avenues for this work include using RL with existing controllers such as model predictive control (MPC) and adapting cutting-edge actor-critic RL algorithms to find policies that meet the performance requirements of process control. Systems of special interest in this work come from energy production, particularly supercritical pulverized coal (SCPC) power production. This work also details the development of advanced models and control systems to solve specific problems in this setting.

Studying the SCPC system, a plantwide model with sufficient detail to develop controllers that are effective over a wide operating range is needed. Starting with a custom unit model for the steam turbine, a plantwide flowsheet is synthesized is Aspen Plus Dynamics with due consideration for the modeling of the balance of the plant (including appropriate sizing for key unit items) and the regulatory control layer. A custom, high-fidelity boiler model, is also integrated into the model, allowing for detailed studies to be conducted. This model is validated against operating data from the plant from which equipment sizing data was available, with parameters estimated where necessary to achieve a good fit to the data. Using this model, advanced model predictive control strategies are investigated for boiler control under load changes.

Using a high-fidelity model of the selective catalytic reduction (SCR) unit from the SCPC plant as a testbed the focus of the work shifts to RL-based controllers. The first of which is an RL approach for online tuning of an underling MPC acting to regulate the plant. Configured in this way, the joint controller works to regulate the plant (MPC) while improving performance (RL). An approximate state-action-reward-state-action (SARSA) algorithm is used to select prediction and control horizons for the MPC problem, and the controller is developed in a two-stage approach, wherein the RL agent is first trained offline on a reduced order model of the SCR to alleviate poor performance when controlling the true plant. This learning is carried out in experimental episodes where excitation of the system is in the hand of the control developer, before moving to the online system where the controller must learn from the system as it operates in real time. Performance improvements are shown for this case with respect to step disturbance rejection and in a load-following scenario, where the plant is changing operation points and several disturbances are simultaneously active. Results are compared to the industrial standard in SCR control. Results are also presented for nonlinear (N-)MPC for SCR regulation, alongside results for a novel soft-constrained formulation to simultaneously bound NOx emissions while minimizing ammonia injection to the unit.

The next controller developed is also a novel combination of RL and MPC, now using the MPC directly as the control policy for the RL agent; this approach is named value function (VF-)MPC. By approximating the stage cost of the MPC, the RL agent is able to learn weights that directly control the tradeoff between input moves and output performance based on the real-time performance of the system, automatically finding an approximation the control policy that minimizes tracking error. Two approaches are proposed, VFMPC(0) and VFMPC(n). VFMPC(0) learns on the one-step reward seen from the system to update the agent’s weights. However, an important realization is that, at each timestep, an on-policy projection is solved within the MPC subject to the current weights known by the agents; leveraging this fact, VFMPC(n) learns on the n-step return calculated across the MPC trajectory, allowing for faster learning. Results are shown for a standard process control example, and the effect of n, the search depth of the policy rollout, it studied. Surprising, owing to error in the value function in the early stages of learning, values of n near the prediction horizon of the controller lead to poor performance, and the fastest learning is found with intermediate values that balance this error against the knowledge gained in the model projection.

The final proposed approach addresses the problem of offset-free control under an actor-critic RL structure. This approach adds to the deep deterministic policy gradient (DDPG) approach (Lillicrap et al., 2015), where deep neural networks are used to approximate the RL value function and policy. These approaches, however, do not guarantee that the state will be returned to zero-deviation from the desired value, only that performance with respect to the reward will be maximized. To ensure that offset is removed in finite time, the state space of the system is partitioned into two, where far from the origin a completely RL-based action is taken to drive the state into a neighborhood of zero; near this point, a structured policy is used wherein a second RL agent learns actions that amount to feedback gains. Under the second policy and mild assumptions the state will then return to the origin with optimal performance. Cases with linear and nonlinear dynamics are taken from the process control literature to show the efficacy of the proposed approach.

Embargo Reason

Publication Pending

Comments

Attached is the revised version of my dissertation.

Share

COinS