![]() | Only 14 pages are availabe for public view |
Abstract Finding relevant objects in a large repository is a fundamental research problem occurring in many applications, such as data cleaning, data integration, web search, and information retrieval. Instant type-ahead fuzzy search, where a user types her query character by character and find the top-k relevant objects based on their text similarity, has become widely involved in many applications because it provides the users with rapid response results and improves the user’s experience. However, inaccurate or delayed results may have a negative effect on user experience. Moreover, incorporating words with other attributes, such as keyword frequency and user domain of expertise, would provide a better user experience. The state-of-the-art algorithms for Instant type-ahead fuzzy search are generally inefficient due to two limitations: (1) their breadth-based search nature results in repeated computations; (2) they deal only with string similarity metrics, such as edit distance, and do not include other attributes in calculating the top-k. To this end, we propose a novel depth-oriented instant type-ahead fuzzy search algorithm, named Depth-Based Fuzzy Auto-Completion (DFA), which largely avoids repeated computations. Then, we introduce another variant of DFA, named Multiple-Attributes Depth-Based Fuzzy Auto-Completion (MDFA), that efficiently includes other (static) strings attributes in calculating the top-k. The efficiency and effectiveness of the proposed approaches are empirically demonstrated using real-world datasets. Experimental results show that DFA is 5-10 times faster than state-of-the-art approaches that use only edit distance, and MDFA is 10-100 times faster than naive approaches that use multiple attributes. |