Normality-based validation for crisp clustering

Lago Fernández, Luis Fernando; Corbacho Abelaira, Fernando

UAM_Biblioteca

Author

Lago Fernández, Luis Fernando

; Corbacho Abelaira, Fernando

Entity

UAM. Departamento de Ingeniería Informática

Publisher

Elsevier B.V.

Date

2010-03

Citation

Pattern recognition 43.3 (2010): 782-795

ISSN

0031-3203 (print); 1873-5142 (online)

DOI

10.1016/j.patcog.2009.09.018

Funded by

This work has been partially supported with funds from MEC BFU2006-07902/BFI, CAM S-SEM-0255-2006 and CAM/UAM CCG08-UAM/TIC-4428

Project

Comunidad de Madrid. S2006/SEM-0255/OLFACTOSENSE; Gobierno de España. BFU2006-07902/BFI

Editor's Version

http://dx.doi.org/10.1016/j.patcog.2009.09.018

Subjects

Crisp clustering; Cluster validation; Negentropy; Informática

URI

http://hdl.handle.net/10486/4572

Note

This is the author’s version of a work that was accepted for publication in Pattern Recognition. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Pattern Recognition, 43, 36, (2010) DOI 10.1016/j.patcog.2009.09.018

Rights

Esta obra está bajo una licencia de Creative Commons Reconocimiento-NoComercial-SinObraDerivada 4.0 Internacional.

Abstract

We introduce a new validity index for crisp clustering that is based on the average normality of the clusters. Unlike methods based on inter-cluster and intra-cluster distances, this index emphasizes the cluster shape by using a high order characterization of its probability distribution. The normality of a cluster is characterized by its negentropy, a standard measure of the distance to normality which evaluates the difference between the cluster's entropy and the entropy of a normal distribution with the same covariance matrix. The definition of the negentropy involves the distribution's differential entropy. However, we show that it is possible to avoid its explicit computation by considering only negentropy increments with respect to the initial data distribution, where all the points are assumed to belong to the same cluster. The resulting negentropy increment validity index only requires the computation of covariance matrices. We have applied the new index to an extensive set of artificial and real problems where it provides, in general, better results than other indices, both with respect to the prediction of the correct number of clusters and to the similarity among the real clusters and those inferred.

Show full item record

Files in this item

Name

normality-based_lago-fernandez_PR_2010_ps.pdf

Size

2.035Mb

Format

PDF

Google™ Scholar:Lago Fernández, Luis Fernando - Corbacho Abelaira, Fernando

This item appears in the following Collection(s)

Producción científica en acceso abierto de la UAM [20456]

UAM_Biblioteca