Cascade Support Vector Machines applied to the Translation Initiation Site prediction problem

Authors

  • Wallison Willian Guimaraes Pontifical University Catholic of Minas Gerais
  • Cristiano Lacerda Nunes Pinto School of Engineering of Minas Gerais
  • Cristiane Neri Nobre Pontifical University Catholic of Minas Gerais
  • Luis Enrique Zarate Pontifical University Catholic of Minas Gerais

Keywords:

Translation Initiation Site, Cascade SVM, Data Mining, Machine Learning

Abstract

The correct identification of the protein coding region is an important and latent problem of biology. The challenge is the lack of deep knowledge about biological systems, specifically the conservative characteristics of the messenger Ribonucleic Acid (mRNA). Thus, the use of computational methods is fundamental to discovery patterns within the Translation Initiation Site (TIS). In Bioinformatics, machine learning methods have been widely applied, the most frequently used method being the Support Vector Machines (SVM), which are based on inductive inference. However, the use of SVM incurs in high computational cost when applied to large data sets, and its training time scales up to quadratically in relation to the data set size. In this study, to tackle this challenge and analyse the algorithm's behavior, we employed a Cascade SVM approach to the TIS prediction problem. This strategy proposes to accelerate the model training process and reduce the number of support vectors. The results achieved in our study showed that the cascaded SVM approach is able to significantly reduce model training times while maintaining the accuracy and F-measure rates similar to the conventional approach (SVM). We also demonstrate the scenarios in which the cascade approach is more suitable for reducing training time.

Downloads

Download data is not yet available.

Downloads

Published

2018-10-01