Junjie Wu, Hui Xiong, Jian Chen
Abstract: Given its importance, the problem of predicting rare classes in large-scale multi-labeled data sets has attracted great attention in the literature. However, rare class analysis remains a critical challenge, because there is no natural way developed for handling imbalanced class distributions. This paper thus fills this crucial void by developing a method for classification using local clustering (COG). Specifically, for a data set with an imbalanced class distribution, we perform clustering within each large class and produce sub-classes with relatively balanced sizes. Then, we apply traditional supervised learning algorithms, such as support vector machines (SVMs), for classification. Along this line, we explore key properties of local clustering for a better understanding of the effect of COG on rare class analysis. Also, we provide a systematic analysis of time and space complexity of the COG method. Indeed, the experimental results on various real-world data sets show that COG produces significantly higher prediction accuracies on rare classes than state-of-the-art methods and the COG scheme can greatly improve the computational performance of SVMs. Furthermore, we show that COG can also improve the performances of traditional supervised learning algorithms on data sets with balanced class distributions. Finally, as two case studies, we have applied COG for two real-world applications: credit card fraud detection and network intrusion detection.
© 2001-2024 Fundación Dialnet · Todos los derechos reservados