A Fast and Effective Strategy for Feature Selection in High-dimensional Datasets

Mariana Tasca; Alexandre Plastino; Celso Ribeiro; Bianca Zadrozny

Authors

Mariana Tasca Universidade Federal Fluminense
Alexandre Plastino Universidade Federal Fluminense
Celso Ribeiro Universidade Federal Fluminense
Bianca Zadrozny IBM Research

Keywords:

classification, feature selection, high-dimensional datasets

Abstract

Feature subset selection (FSS) is an important preprocessing step for the classification task, specially in the case of datasets with high dimensionality, i.e., thousands of potentially predictive attributes. There is an extensive literature on methods for performing FSS, but most of them do not apply to datasets with high dimensionality because of the prohibitive computational cost. This paper proposes a simple feature subset selection algorithm which is suitable for datasets with high dimensionality. Our proposal is based on the execution of a constructive procedure followed by a local search strategy, in just one iteration. We also presented a multi-iteration version of our algorithm (which characterizes a GRASP implementation) and included an experimental evaluation on this strategy. The experiments were conducted over a variety of high-dimensional datasets, showing that the proposed method can reach, in most cases, better accuracies -- with a much lower computational cost -- than some well-known algorithms.

A Fast and Effective Strategy for Feature Selection in High-dimensional Datasets

Authors

Keywords:

Abstract

Downloads

Additional Files

Published

Issue

Section

Developed By

Language

Information