Author: Mohamed, Abdallah Reda Abdallah./ Title: Improving Image Segmentation Using Deep<br>Learning-based Approaches /

Search In this Thesis

العنوان

Improving Image Segmentation Using Deep
Learning-based Approaches /

المؤلف

Mohamed, Abdallah Reda Abdallah.

هيئة الاعداد

باحث / عبدالله رضا عبدالله محمد

مشرف / محمود إبراهيم خليل

مناقش / محمد حامد صدقي

مناقش / هدي قرشي محمد

تاريخ النشر

2024.

عدد الصفحات

107 P. :

اللغة

الإنجليزية

الدرجة

ماجستير

التخصص

هندسة النظم والتحكم

تاريخ الإجازة

1/1/2024

مكان الإجازة

جامعة عين شمس - كلية الهندسة - قسم هندسة الحاسبات والنظم

الفهرس

Only 14 pages are availabe for public view

from

107

from

107

Abstract

This thesis presents an in-depth exploration of the field of panoptic segmentation, a central task in computer vision that combines instance and semantic segmentation to provide a unified, pixel-wise classification of an image. Panoptic segmentation plays a crucial role in numerous applications, from autonomous driving to medical imaging, by enabling a comprehensive understanding of complex scenes. Despite its potential, achieving real-time performance while maintaining high accuracy remains a significant challenge in this field. This research aims to address this challenge by focusing on the optimization of a pioneering model in panoptic segmentation, the “You Only Segment Once ”(YOSO) architecture.
A novel adaptation, referred to as the “Real-Time YOSO ”(RT-YOSO) model, introduces substantial modifications to the YOSO architecture. The RT-YOSO model aims to enhance real-time performance without sacrificing the accuracy of panoptic segmentation. To achieve this, the proposed approach replaces the original Residual Networks (ResNet) backbone with the more computationally efficient Short-Term Dense Concatenation (STDC) networks. Furthermore, it incorporates an instance-aware cropping mechanism, which significantly contributes to the model’s real-time performance. These modifications aim to strike a balance between inference time efficiency and panoptic quality (PQ), a critical aspect of image segmentation.
The thesis unfolds across five chapters, each contributing a unique perspective and playing a unique role in the narrative:
1. In Chapter 1: The introduction encapsulates the background, motivations, objectives, contributions, and structure of the thesis. It sets the stage, offering readers a glimpse into the intricate world of image segmentation and its inherent challenges.
2. In Chapter 2: Dives into the background and problem statement, unfolding the preliminaries in computer vision, with a focus on image classification and segmentation. It distinguishes between various segmentation tasks and outlines their historical and recent advancements.
3. In Chapter 3: Elaborates on the theoretical foundations that offer insights into the building blocks of panoptic segmentation, performance measures for computer vision models, and the datasets instrumental in this field. A particular focus is placed on the STDC backbone architecture.
4. In Chapter 4: The methodology takes center stage, detailing the deep dive into
YOSO, the exploration of Feature Pyramid Aggregator (FPA), and the unveiling
iii
iii
of the Separable Dynamic Decoder (SDD). It introduces RT-YOSO, underscoring the enhancements and modifications that set it apart and outlines the experimental setup and implementation details.
5. In Chapter 5: Encapsulates the conclusions and offers insightful suggestions for future directions, paving the way for subsequent innovations in the panoptic segmentation field.
In conclusion, the thesis presents a significant contribution to the field of real-time panoptic segmentation by proposing a novel and efficient adaptation of the YOSO architecture. The RT-YOSO model offers a promising solution to the challenges of real-time panoptic segmentation, demonstrating the potential of deep learning-based approaches for improving image segmentation in real-world, time-sensitive applications.
Keywords: Image Segmentation, Panoptic Segmentation, Deep Learning, Real-time Processing, Scene Understanding.