Search In this Thesis
   Search In this Thesis  
العنوان
Efficient Support of Textual Similarity Queries \
المؤلف
Ahmed, Magdy Abdel Naby Abdel Sayed.
هيئة الاعداد
باحث / مجدي عبد النبي عبد السيد أحمد
مشرف / يسرى ابراهيم طه عثمان
مشرف / أحمد محمد السيد حسن
مناقش / عمرو احمد عبد المنعم المصرى
elmasry@es.rutgers.edu
مناقش / أيمن عادل عبد الحميد
الموضوع
Computer Engineering.
تاريخ النشر
2020.
عدد الصفحات
57 p. :
اللغة
الإنجليزية
الدرجة
ماجستير
التخصص
هندسة النظم والتحكم
تاريخ الإجازة
29/10/2020
مكان الإجازة
جامعة الاسكندريه - كلية الهندسة - هندسة الحاسب و النظم
الفهرس
Only 14 pages are availabe for public view

from 72

from 72

Abstract

Finding relevant objects in a large repository is a fundamental research problem occurring in many applications, such as data cleaning, data integration, web search, and information retrieval. Instant type-ahead fuzzy search, where a user types her query character by character and find the top-k relevant objects based on their text similarity, has become widely involved in many applications because it provides the users with rapid response results and improves the user’s experience. However, inaccurate or delayed results may have a negative effect on user experience. Moreover, incorporating words with other attributes, such as keyword frequency and user domain of expertise, would provide a better user experience. The state-of-the-art algorithms for Instant type-ahead fuzzy search are generally inefficient due to two limitations: (1) their breadth-based search nature results in repeated computations; (2) they deal only with string similarity metrics, such as edit distance, and do not include other attributes in calculating the top-k. To this end, we propose a novel depth-oriented instant type-ahead fuzzy search algorithm, named Depth-Based Fuzzy Auto-Completion (DFA), which largely avoids repeated computations. Then, we introduce another variant of DFA, named Multiple-Attributes Depth-Based Fuzzy Auto-Completion (MDFA), that efficiently includes other (static) strings attributes in calculating the top-k. The efficiency and effectiveness of the proposed approaches are empirically demonstrated using real-world datasets. Experimental results show that DFA is 5-10 times faster than state-of-the-art approaches that use only edit distance, and MDFA is 10-100 times faster than naive approaches that use multiple attributes.