Author: حسن، محمد صابر عبد الفتاح/ Title: أسلوب كفء لإدارة البيانات المعتمدة على رسومات.

Search In this Thesis

العنوان

أسلوب كفء لإدارة البيانات المعتمدة على رسومات.

الناشر

محمد صابر عبد الفتاح حسن

المؤلف

حسن، محمد صابر عبد الفتاح

هيئة الاعداد

باحث / محمد صابر عبد الفتاح حسن

مشرف / مصطفى محمود عارف

مشرف / طارق فؤاد غريب

تاريخ النشر

2011

عدد الصفحات

63ص؛

اللغة

الإنجليزية

الدرجة

ماجستير

التخصص

Information Systems

تاريخ الإجازة

1/1/2011

مكان الإجازة

اتحاد مكتبات الجامعات المصرية - Computer and Information Sciences

الفهرس

Only 14 pages are availabe for public view

from

Abstract

Data management is the process of storing data efficiently besides supporting efficient processing of operations on the stored data. The current applications that depend on data modeled by graphs are ubiquitous such as bioinformatics and semantic web applications. Processing graph queries over large database is a performance bottleneck for these applications and hence efficient query processing techniques are important. Sub-graph queries and super-graph queries are two important query types having a lot of applications. Both of them require finding sub-graph isomorphism of a given graph. In this research work, we presented a technique for processing structural graph queries. This technique has two different versions; a version for processing sub-graph queries named eSubIndex and another one for processing super-graph queries named eIndex.
Structural graph queries require sub-graph isomorphic tests which are known to be NP-complete. For having an efficient solution to the sub-graph query problem, the number of the sub-graph isomorphic tests in the filtering phase was reduced by using linear time algorithms. A polynomial time post-filtering phase is added. The idea behind this post filtering phase is to take into consideration the full structure of the database graphs generated from eSubIndex scanning. Higher pruning power is achieved due to using this post-filtering phase. Feeding fewer candidate graphs into the verification phase enhances the overall solution performance and makes processing queries over large graph databases more feasible. eSubIndex was compared to both gIndex and C-Tree. The results show that eSubIndex has more pruning power in its filtering phase due to taking the full structure of the database graphs into consideration. The total processing time of eSubIndex is shown to be less than that of gIndex and C-Tree. The reduced time comes from reducing the number of graphs being feed into the verification phase.
The same line of reasoning applies for our enhanced super-graph processing technique known as eIndex. The number of the NP-Complete operations is reduced by a filtering phase which tries to use a polynomial time algorithm (sub-sequence matching) before trying sub-graph isomorphic tests. Sub-sequence matching could be done in linear time; so, this linear time matching algorithm could reduce the number of the exponential time graph sub-graph isomorphism testing operations without increasing the total processing cost. Also, a post filtering stage is added which takes into consideration the full structure of the database graphs. This post filtering stage increases the pruning power of the filtering phase. Increasing the pruning power of the filtering phase yields fewer number of the candidate graphs which are fed into the verification phase. The verification phase uses the NP-Complete sub-graph isomorphism testing algorithm for each candidate graph. So, reducing the number of candidate graphs generated from the filtering phase results in reducing the total processing time of a given query. The experimental results show that the total processing time of eIndex is less than that of cIndex. Also, it is shown that building the index of eIndex takes less time than building the index used by the cIndex technique.
As a summary for this research work, we have the following conclusions:
• Using polynomial time algorithms in the filtering phase can increase the pruning power of the filtering phase.
• Considering the full structure of the graphs in the filtering phase increases the pruning power of the filtering phase.
• Increasing the pruning power of the filtering phase yields fewer candidate graphs.
• Having fewer candidate graphs enhances the total performance of graph query processing as the verification phase runs fewer number of sub-graph isomorphic tests.
The area of data management in general and that of graph data management in specific are active research areas that have a lot of future research work. The following points could be future work that extends the work presented in this thesis:
• Efficient Secondary Storage of Graph Database: reducing the storage space is one of the main points of data management; for huge graph databases, storing the graphs efficiently on the secondary storage leads to scalable solution.
• Processing Queries over Graphs on Secondary Storage: The work in this thesis assumes that the graphs exist in the main memory. Huge databases cannot be loaded completely in the main memory and hence it is critical to have a solution for processing queries over graphs on the secondary storage to have a scalable solution.
• Pre-computing data calculated by pseudo-sub-graph isomorphism: The pseudo-sub-graph isomorphism is used heavily in this research work. Pseudo-sub-graph isomorphism requires computing data related to the connectivity of each vertex in the graph. Computing this data takes time; this time could be reduced by pre-computing this data during the index construction phase. Pre-computing data for the pseudo-sub-graph isomorphism will decrease the time taken during the filtering phase and hence the total time of the overall query processing will be reduced.
• Innovating other approximate algorithms to reduce the number of sub-graph isomorphism: This research work proved that using approximate algorithms in the filtering phase reduces the total processing time. Developing innovative approximate algorithms for the NP-Complete sub-graph isomorphism will lead to enhanced query processing. These approximate algorithms should have small processing time in addition to have high pruning power.
• Representing complex graphs in main memory using efficient and compact representation: graphs are needed to be loaded in the main memory while doing isomorphic comparisons. Representing complex graphs efficiently in the main memory in a compact form will allow us to load more graphs in the main memory which will increase both the throughput and the scalability