Ayuda
Ir al contenido

Dialnet


Sampling Techniques to Overcome Class Imbalance in a Cyberbullying Context

    1. [1] IBM Research – Thomas J. Watson Research Center

      IBM Research – Thomas J. Watson Research Center

      Town of Yorktown, Estados Unidos

    2. [2] Technological University Dublin
  • Localización: Journal of Computer-Assisted Linguistic Research, ISSN-e 2530-9455, Nº. 3, 2019, págs. 21-40
  • Idioma: inglés
  • Enlaces
  • Resumen
    • The majority of datasets suffer from class imbalance where samples of a dominant class significantly outnumber the samples available for the minority class that is to be detected. Prediction and classification machine learning models work best when there are roughly equal numbers of each class type. This paper explores sampling techniques that can be used to overcome this class imbalance problem in a cyberbullying context. A newly classified cyberbullying dataset, including detailed descriptions of the criteria used in its classification, was used to examine the feasibility of applying text mining techniques, to automate the detection of cyberbullying text when the dataset shows a significant class imbalance between the positive, cyberbullying, sample and the negative, not cyberbullying, samples. In this paper, we will investigate if oversampling the minority positive class or undersampling the majority negative class affects the performance of a prediction model. A compromise solution where the positive class is partially oversampled, and the negative class is partially undersampled is also examined. Although not strictly a class imbalance solution, sampling using the most frequently observed features was also explored. 


Fundación Dialnet

Dialnet Plus

  • Más información sobre Dialnet Plus

Opciones de compartir

Opciones de entorno