![]() | Only 14 pages are availabe for public view |
Abstract Text data increases every day with a huge amount. We usually deal with vast quantities of text data through the Internet or on our computer systems. It will be useful to have a method to organize this huge amount of text data. With this huge increase, clustering text papers becomes an important research topic. For many years and till now many researchers are trying to find the best algorithm to make text clustering. Many algorithms were used to perform text clustering such as Naïve Bayes, Support Vector Machines (SVMs), and Self-Organizing Maps (SOMs). Research papers are a special type of text documents as they have specific expressions and scientific keywords. This was our motivation to develop an algorithm which can cluster research papers. This thesis proposes a method to cluster research papers based on SOMs. A SOM is an unsupervised machine learning method. It has some parameters which need to be optimized in order to produce the best possible solution. These parameters are either set manually or using trial and error methods. In this work, we propose to use the well-known genetic algorithm to search the parameter space in an effort to find the best values automatically. Accordingly, in this thesis we decided to use SOM algorithm optimized by genetic algorithm. First, we built our algorithm and test it on clustering gray colors as a simple case study in order to test our algorithm and measure its efficiency. Then we applied our algorithm on three different research papers data sets to cluster them. To achieve better results, we also integrated our suggested algorithm with a pretrained Word2Vec model to be able to match different words having similar meaning. Finally, we compared our results with previous research on clustering research papers showing that our work outperform their results which was already compared it to many other earlier methods. |