Author ORCID Identifier

https://orcid.org/0000-0002-8805-8252

Semester

Spring

Date of Graduation

2024

Document Type

Thesis

Degree Type

College

Statler College of Engineering and Mineral Resources

Department

Mechanical and Aerospace Engineering

Committee Chair

Ali Baheri

Committee Member

Piyush Mehta

Committee Member

Yu Gu

Abstract

Reinforcement learning (RL) has revolutionized decision-making across a wide range of domains over the past few decades. Yet, deploying RL policies in real-world scenarios presents the crucial challenge of ensuring safety. Traditional safe RL approaches have predominantly focused on incorporating predefined safety constraints into the policy learning process. However, this reliance on predefined safety constraints poses limitations in dynamic and unpredictable real-world settings where such constraints may not be available or sufficiently adaptable. Bridging this gap, we propose a novel approach that concurrently learns a safe RL control policy and identifies the unknown safety constraint parameters of a given environment. Initializing with a parametric signal temporal logic (pSTL) safety specification and a small initial labeled dataset, we frame the problem as a bilevel optimization task, integrating constrained policy optimization, using a Lagrangian-variant of the twin delayed deep deterministic policy gradient (TD3) algorithm, with Bayesian optimization (BO) for optimizing parameters for the given pSTL safety specification. Through experimentation in comprehensive case studies, we validate the efficacy of this approach across varying forms of environmental constraints, consistently yielding safe RL policies with high returns. Furthermore, our findings indicate successful learning of STL safety constraint parameters, exhibiting a high degree of conformity with true environmental safety constraints. The performance of our model closely mirrors that of an ideal scenario that possesses complete prior knowledge of safety constraints, demonstrating its proficiency in accurately identifying environmental safety constraints and learning safe policies that adhere to those constraints.

Recommended Citation

Yifru, Lunet Abiye, "JOINT LEARNING OF UNKNOWN SAFETY CONSTRAINTS AND CONTROL POLICIES IN REINFORCEMENT LEARNING" (2024). Graduate Theses, Dissertations, and Problem Reports. 12401.
https://researchrepository.wvu.edu/etd/12401

Download

Included in

Computational Engineering Commons, Other Computer Engineering Commons, Robotics Commons

COinS

DOI

http://doi.org/10.33915/etd.12401

Graduate Theses, Dissertations, and Problem Reports

JOINT LEARNING OF UNKNOWN SAFETY CONSTRAINTS AND CONTROL POLICIES IN REINFORCEMENT LEARNING

Author ORCID Identifier

Semester

Date of Graduation

Document Type

Degree Type

College

Department

Committee Chair

Committee Member

Committee Member

Abstract

Recommended Citation

Included in

DOI

Browse

Resources

Search

Author Corner

Graduate Theses, Dissertations, and Problem Reports

JOINT LEARNING OF UNKNOWN SAFETY CONSTRAINTS AND CONTROL POLICIES IN REINFORCEMENT LEARNING

Author

Author ORCID Identifier

Semester

Date of Graduation

Document Type

Degree Type

College

Department

Committee Chair

Committee Member

Committee Member

Abstract

Recommended Citation

Included in

Share

DOI

Browse

Resources

Search

Author Corner