Author: Nawar, Mahmoud Ramadan./ Title: Intelligent trafﬁc signal controllers based on cooperative multi-agent framework /

Search In this Thesis

العنوان

Intelligent trafﬁc signal controllers based on cooperative multi-agent framework /

المؤلف

Nawar, Mahmoud Ramadan.

هيئة الاعداد

باحث / محمود رمضان نوار

مشرف / عبدالوهاب كامل السماك

مشرف / أحمد حسن فارس

مناقش / هاله حلمي زايد

مناقش / مني فاطمة محمد مرسي

الموضوع

Intelligent trafﬁc signal.

تاريخ النشر

2020.

عدد الصفحات

77 P. :

اللغة

الإنجليزية

الدرجة

ماجستير

التخصص

الهندسة الكهربائية والالكترونية

تاريخ الإجازة

15/2/2020

مكان الإجازة

جامعة بنها - كلية الهندسة بشبرا - الهندسة الكهربائية

الفهرس

Only 14 pages are availabe for public view

from

Abstract

Trafﬁc congestion affects our daily activities; each Cairo resident wastes 112 hours yearly because of congestion. The consequences of congestion would be more severe in the future because of the increasing immigration to urban areas. Ineffective signal timing is a major cause of congestion in intersections as controllers may have an outdatedplansandoftenfailtohandleunplannedeventslikeaccidentsorunusualdemand. Therefore,theeffortsforbuildingadaptivesignaltimingcontrolsystemsarerisinginthe relevant research communities, including computer science and transportation groups. This thesis addresses the problem of building an adaptive and cooperative signal timing system that reduces the congestion in urban areas using model-free methods, responds to unexpected trafﬁc conditions, and is cooperative; i.e, different intersections controllers cooperate to reduce the network-wide measures. In this thesis, the trafﬁc signal timing problem is modeled as a Markov decision process where intersections are controlled by model-free reinforcement learning agents that coordinate their timing plans. Firstly,amulti-agentframeworkisproposedbasedonthecoordinationgraphswhere the global objective of the trafﬁc network is decomposed into a linear sum of local edge-based functions. This edge-based decomposition scales linearly with edges. Further, a novel combination of max-plus joint action selection algorithm with two cooperative variants of the model-free Q-learning algorithm, including sparse cooperative Q-learning and relative sparse cooperative Q-learning, is utilized to control multiintersectionnetworks. Theproposedsparsecooperativemulti-agentframeworkisdevelopedbasedonreinforcementlearningagentthatadoptsajunctionbasedstatedeﬁnition, and a multi-objective reward function that reduces the number of waiting cars and the sum of waiting times. Extensive experiments are carried out, and their results demonstrate the effectiveness of our proposed framework. In comparison with independent Q-learning agents, our proposed framework achieves a superior performance in terms of vehicle trip time,waiting time and jam length; it reduces these measures by up to 8%, 37%, and 43% respectively. In addition, the reported results show that the proposed relative sparse cooperative Q-learning outperforms sparse cooperative Q-learning in avoiding vehicles teleports, which leads to better driver satisfaction. Secondly, a deep multi-agent reinforcement learning framework is developed for signal timing. A deep reinforcement learning agent is used to control the signal lights in single intersection. The proposed agent uses a deep convolutional neural network to extract the crucial features from the environment state that is described by raw trafﬁc information; i.e., vehicles positions, speeds, and waiting times. Besides, the agent utilizes a multi-objective reward that reduces many trafﬁc measures e.g, waiting time, speed delay, etc. Moreover, The proposed agent utilizes one of the state-of-the-art deep reinforcement learning algorithms; the Rainbow agent which provides further space of enhancements to the conventional deep Q-network agent. Furthermore, the Rainbow agent is integrated with max-plus algorithm to build a deep multi-agent trafﬁc signal control framework; the Rainbow agent is trained into small subproblems, then these small models are transfered into larger problems where max-plus is used to select the joint action. Extensive experiments illustrate that our proposed deep framework outperforms the baseline under a number of settings and trafﬁc measures, including trip time, waiting time, fuel consumption, and stability. For example, the proposed framework reduces the trip time and waiting time by 52% and 83% in a single intersection network and by 60% and 70% in a four-intersection network. Thisthesisshowsthatcontrollerscouldmanagetrafﬁcbasedonimage-likedataand demonstrates that reward deﬁnition is very critical for the stability of controllers that utilize deep reinforcement learning techniques. Further, it emphasizes the signiﬁcance of the extensions to deep Q-network.