Author: Selim, Mahmoud Ahmed./ Title: Applied Machine Learning on Safety-Critical Cyber-Physical Systems \

Search In this Thesis

العنوان

Applied Machine Learning on Safety-Critical Cyber-Physical Systems \

المؤلف

Selim, Mahmoud Ahmed.

هيئة الاعداد

باحث / محمود احمد سليم احمد محمد

مشرف / حازم محمود عباس

مشرف / محمد واثق على كامل الخراشى

مناقش / حازم محمود عباس

تاريخ النشر

2023.

عدد الصفحات

135 p. :

اللغة

الإنجليزية

الدرجة

ماجستير

التخصص

هندسة النظم والتحكم

تاريخ الإجازة

1/1/2023

مكان الإجازة

جامعة عين شمس - كلية الهندسة - هندسة الحاسبات والنظم

الفهرس

Only 14 pages are availabe for public view

from

135

from

135

Abstract

Reinforcement learning (RL) algorithms can achieve state-of-the-art performance in decision-making and continuous control tasks. However, applying RL algorithms on safety-critical systems is not yet well justified due to the exploration nature of many RL algorithms. To address this challenge, and justify the widespread deployment of RL algorithms, robots must respect safety constraints without sacrificing performance. In this work, we propose three methods that utilize data-driven methods to enforce safety constraints on RL algorithms. In the first method, we propose a Black-box Reachability- based Safety Layer (BRSL) with three main components: (1) data-driven reachability analysis for a black-box robot model, (2) a trajectory rollout planner that predicts future actions and observations using an ensemble of neural networks trained online, and (3) a differentiable polytope collision check between the reachable set and obstacles that en- ables correcting unsafe actions. The second and third methods both utilize data-driven predictive control to make a safety layer that acts as a filter for unsafe actions. The second method enforces safety for unknown linear systems. It solves an optimization problem online that enforces safety constraints on the system while being as little inva- sive as possible to the RL agent’s action of choice. The third method, however, adopts the technique proposed in the second method to enforce safety for unknown nonlinear time-varying systems while being as little invasive to the RL chosen action as possible. Moreover, we reformulate the data-driven reachability analysis mathematically to allow for large-size data matrices. More specifically, we completely get rid of the computation- ally costly data matrix pseudo inverse and replace it with more efficient matrix-vector multiplications. This allows the third method to work with large-scale data matrices for more complex systems, and to change the data matrix online to incorporate safety constraints for time-varying systems as well. We show that the third method is in- deed able to enforce safety constraints for unknown nonlinear time-varying systems all while utilizing gpus for real-time operation. We evaluate the three proposed methods using three different high-fidelity simulators. Mainly, we use two different types of tasks:
(1) robot navigation, and (2) robot control. In both types of environments, we define the safe region and safety constraints that shouldn’t be violated for a safe operation of the system. We also compare the proposed methods with state-of-the-art safe rein- forcement learning algorithms to properly evaluate our algorithms. We show that all methods method outperforms state-of-the-art safe RL methods and a vanilla RL agent in terms of reward and safety violations for navigation problems on a Turtlebot 3 in Gazebo, a quadrotor in Unreal Engine 4 (UE4), the control of a cheetah running as fast as it can in Mujoco, and a path following hexarotor under the effect wind as external a disturbance to the system with the unsafe set adjacent to the area of highest reward.