Ayuda
Ir al contenido

Dialnet


Clasificación de noticias criminales basada en procesamiento del lenguaje natural y algoritmos de aprendizaje automático

    1. [1] Universidad del Cauca

      Universidad del Cauca

      Colombia

  • Localización: RISTI: Revista Ibérica de Sistemas e Tecnologias de Informação, ISSN-e 1646-9895, Nº. Extra 38, 2020, págs. 117-129
  • Idioma: español
  • Enlaces
  • Resumen
    • español

      Camilo Ernesto Sarmiento Torres

    • English

      In this work, a classification system of criminal news was developed from different digital press media, supported by natural language processing techniques and machine learning algorithms. Initially, a criminal news data set was constructed where eight types of crime were identified. Subsequently, the documents were pre-processed, the stop words were eliminated, a lemmatization was applied, and a representation of the documents with the bag of words model, where the coefficient of term frequency-inverse document frequency (tf-idf) was estimated.

      In addition, eight-word dictionaries were built according to the types of crimes and implemented to estimate the performance of five supervised classification algorithms. The random forest algorithm obtained the best performance with 97.22% of accuracy, 98.36% of precision, 98.35% of sensitivity, F1 score of 98.32%, and MCC of 0.97% in the test performed.


Fundación Dialnet

Dialnet Plus

  • Más información sobre Dialnet Plus

Opciones de compartir

Opciones de entorno