A Fast and Effective Strategy for Feature Selection in High-dimensional Datasets


  • Mariana Tasca Universidade Federal Fluminense
  • Alexandre Plastino Universidade Federal Fluminense
  • Celso Ribeiro Universidade Federal Fluminense
  • Bianca Zadrozny IBM Research


classification, feature selection, high-dimensional datasets


Feature subset selection (FSS) is an important preprocessing step for the classification task, specially in the case of datasets with high dimensionality, i.e., thousands of potentially predictive attributes. There is an extensive literature on methods for performing FSS, but most of them do not apply to datasets with high dimensionality because of the prohibitive computational cost. This paper proposes a simple feature subset selection algorithm which is suitable for datasets with high dimensionality. Our proposal is based on the execution of a constructive procedure followed by a local search strategy, in just one iteration. We also presented a multi-iteration version of our algorithm (which characterizes a GRASP implementation) and included an experimental evaluation on this strategy. The experiments were conducted over a variety of high-dimensional datasets, showing that the proposed method can reach, in most cases, better accuracies -- with a much lower computational cost -- than some well-known algorithms.


Download data is not yet available.