![]() | Only 14 pages are availabe for public view |
Abstract Cancer is a dangerous disease that causes death worldwide. Discovering few genes rel- evant to one cancer disease can result in e{uFB00}ective treatments. The challenge associated with the Microarray datasets is its high dimensionality; the huge number of features compared to the modest number of samples in these datasets. Recent research e{uFB00}orts attempted to reduce this high-dimensionality using di{uFB00}erent feature selection techniques. This thesis presents two ensemble feature selection techniques based on t-test and Ge-netic algorithm; Nested genetic algorithm (NestedGA) and the ensemble feature pool approach (EFPA). After preprocessing the data using t-test, the two proposed ap- proaches are used to get the optimal subset of features by combining data from two di{uFB00}erent microarray datasets. NestedGA consists of two nested genetic algorithms (Outer and Inner) that run on two di{uFB00}erent kinds of datasets. The outer genetic algorithm (OGA-SVM) works on Microarray gene expression dataset, whereas the Inner Genetic algorithm (IGA-NNW) runs on DNA methylation dataset. NestedGA is performed on a Colorectal cancer dataset with 5-fold cross validation. After applying NestedGA, the Incremental feature selection (IFS) strategy is used to get the smallest optimal genes subset. The genes subset has been validated on an independent dataset resulting in 99.9% classi{uFB01}cation accuracy the non- nested GAs |