Cascade Support Vector Machines applied to the Translation Initiation Site prediction problem
Keywords:Translation Initiation Site, Cascade SVM, Data Mining, Machine Learning
AbstractThe correct identification of the protein coding region is an important and latent problem of biology. The challenge is the lack of deep knowledge about biological systems, specifically the conservative characteristics of the messenger Ribonucleic Acid (mRNA). Thus, the use of computational methods is fundamental to discovery patterns within the Translation Initiation Site (TIS). In Bioinformatics, machine learning methods have been widely applied, the most frequently used method being the Support Vector Machines (SVM), which are based on inductive inference. However, the use of SVM incurs in high computational cost when applied to large data sets, and its training time scales up to quadratically in relation to the data set size. In this study, to tackle this challenge and analyse the algorithm's behavior, we employed a Cascade SVM approach to the TIS prediction problem. This strategy proposes to accelerate the model training process and reduce the number of support vectors. The results achieved in our study showed that the cascaded SVM approach is able to significantly reduce model training times while maintaining the accuracy and F-measure rates similar to the conventional approach (SVM). We also demonstrate the scenarios in which the cascade approach is more suitable for reducing training time.
Download data is not yet available.