الفهرس | Only 14 pages are availabe for public view |
Abstract Huge dimensionality in real world datasets is considered an obstacle for data analysis. Unbalanced datasets contain a huge number of features against a few number of samples. An issue may appear when analyzing large datasets, such as noise, irrelevant, and redundant data. This kind of datasets may affect negatively any future decision, and it may lead the classification process to poor performance. Medical datasets are a clear example of huge datasets that need simplification. Microarrays are used to represent samples of genes used to diagnose cancer cases. It works by analyzing a huge number of genes, investigating which genes are activated and responsible for cancer.Gene expression in microarray is the main key for evaluating how much this gene is involved in causing the disease.Feature selection process is considered a solution to overcome the problem of huge dimensionality. It helps in clarifying any kind of dataset. In feature selection, datasets are refined and reduced to a small subset containing the most informative features\genes in the original dataset. selecting the optimal features\genes can help in improving the performance of the classification process through reducing time and memory storage. Evaluating feature selection process appears through applying a classification process using the produced small subset of data |