Ayuda
Ir al contenido

Dialnet


Resumen de Cluster validity functions for categorical data: a solution-space perspective

Liang Bai, Jiye Liang

  • For categorical data, there are three widely-used internal validity functions: the $$k$$ k -modes objective function, the category utility function and the information entropy function, which are defined based on within-cluster information only. Many clustering algorithms have been developed to use them as objective functions and find their optimal solutions. In this paper, we study the generalization, effectiveness and normalization of the three validity functions from a solution-space perspective. First, we present a generalized validity function for categorical data. Based on it, we analyze the generality and difference of the three validity functions in the solution space. Furthermore, we address the problem whether the between-cluster information is ignored when these validity functions are used to evaluate clustering results. To the end, we analyze the upper and lower bounds of the three validity functions for a given data set, which can help us estimate the clustering difficulty on a data set and compare the performance of a clustering algorithm on different data sets.


Fundación Dialnet

Dialnet Plus

  • Más información sobre Dialnet Plus