Scalable pattern mining with Bayesian networks as background knowledge

Autores: Szymon Jaroszewicz, Tobias Scheffer, Dan Simovici
Localización: Data mining and knowledge discovery, ISSN 1384-5810, Vol. 18, Nº 1, 2009, págs. 56-100
Idioma: inglés
Texto completo no disponible (Saber más ...)
Resumen
- Abstract: We study a discovery framework in which background knowledge on variables and their relations within a discourse area is available in the form of a graphical model. Starting from an initial, hand-crafted or possibly empty graphical model, the network evolves in an interactive process of discovery. We focus on the central step of this process: given a graphical model and a database, we address the problem of finding the most interesting attribute sets. We formalize the concept of interestingness of attribute sets as the divergence between their behavior as observed in the data, and the behavior that can be explained given the current model. We derive an exact algorithm that finds all attribute sets whose interestingness exceeds a given threshold. We then consider the case of a very large network that renders exact inference unfeasible, and a very large database or data stream. We devise an algorithm that efficiently finds the most interesting attribute sets with prescribed approximation bound and confidence probability, even for very large networks and infinite streams. We study the scalability of the methods in controlled experiments; a case-study sheds light on the practical usefulness of the approach.

Acceso de usuarios registrados

¿Es nuevo? Regístrese

Coordinado por: