A Relevance Measure for Multivalued Attributes
Keywords:attribute selection, classification, multi-relational data mining, multivalued attributes, relevance measures
AbstractAn important step in the knowledge discovery in databases (KDD) process is the attribute selection procedure, which aims at choosing a subset of attributes that can represent the important information within the data. Most of the existing attribute selection methods can only handle simple attribute types, such as categorical and numerical. In particular, these methods cannot be applied to multivalued attributes, which are attributes that take multiple values simultaneously for the same instance in the dataset. In many real datasets, however, multivalued attributes are present, e.g., the types of books owned by a person may be represented by a multivalued attribute. This article proposes a relevance measure for multivalued attributes, which aims at measuring their importance for classification. The proposed measure takes into account the ability that the attribute has for determining the instance class. In order to evaluate the proposed measure, experiments were conducted with several datasets submitted to multi-relational classifiers. The experiments show that the resulting accuracy values follow, in most cases, the values of the proposed relevance measure. This is an evidence that the proposed measure can be a good indicator of the relevance of multivalued attributes for classification.
Download data is not yet available.