Search In this Thesis
   Search In this Thesis  
العنوان
Mining Structural Patterns for Automatic Protein Function Prediction /
المؤلف
Amin, Huda Amin Maghawry.
هيئة الاعداد
باحث / Huda Amin Maghawry Amin
مشرف / Mostafa Gadal-Haqq M. Mostafa
مشرف / Mohamed Hashem Abdel Aziz
مناقش / Tarek Fouad Gharib
تاريخ النشر
2014.
عدد الصفحات
158 p. :
اللغة
الإنجليزية
الدرجة
الدكتوراه
التخصص
Information Systems
تاريخ الإجازة
1/1/2014
مكان الإجازة
اتحاد مكتبات الجامعات المصرية - Information Systems
الفهرس
Only 14 pages are availabe for public view

from 158

from 158

Abstract

One of the challenging problems in bioinformatics is the prediction of protein function. Protein function is the main key that can be used to classify different proteins. The analysis of proteins and their functions is an important research area. Such analysis affects many applications like clarification of the living body mechanism, treatment of diseases and drug industry. Protein function can be inferred experimentally with very small throughput and high cost or computationally with very high throughput and lower cost. Many of protein sequences and structures are available but with no knowledge about their function. Therefore, methods for protein function prediction are highly and continuously required. Computational methods are based on protein sequences or structures. Protein functions are highly related to their structures. Therefore, structure-based proteins representation plays an important role in the prediction process.
This thesis presents a modification to an existing protein representation approach which utilizes distance patterns between protein residues and a maximum cutoff. The proposed modified representation considers the whole protein instead of using cutoff. Comparative analysis was done to evaluate the proposed representation method and the existing method. The aspect of protein function considered is based on enzyme activity. The results show that the proposed representation outperforms the existing representation with a prediction accuracy of 90.12% and 80.27% for superfamily and family level, respectively, with accuracy improvement of about 5% in average.
This thesis also presents a new structure-based protein representation for efficient protein function prediction. The new representation is based on three-dimensional patterns of protein residues. It utilizes atoms coordinates of protein residues, including the angles and distance patterns. The proposed representation uses protein structure only with no need to any sequence information. Besides, it does not need any prior alignment process. The aspects of protein functions considered using different datasets: Predicting enzymes family and superfamily and classifying enzymes versus non-enzymes proteins. The prediction accuracy of the proposed representation using various classification methods outperforms a recently introduced representation that is based only on the distance patterns. The results show that the proposed representation achieved prediction accuracy of 98.3% in predicting superfamily, 91% in predicting family and 79.25% in predicting enzyme proteins, with improvement of about 10% on average.
Finally, the thesis presents a study of different protein-derived sequence, psychochemical and structure features. The objective was to enhance the prediction of DNA-binding proteins and classes. This is achieved through finding efficient protein representations that able to predict whether a protein is DNA-binding protein and analyzing how well protein-derived representations predict each of DNA-binding protein classes. The protein features achieved accuracy improvement of about 7% on average for the prediction of DNA-binding proteins. The proposed representation when combined with other features achieved improvement in accuracy about 7% and 12% on average for the prediction of DNA-binding proteins and DNA-binding protein classes, respectively.