Ayuda
Ir al contenido

Dialnet


Document Similarity by Word Clustering with Semantic Distance

    1. [1] National Institute Of Technology

      National Institute Of Technology

      Japón

    2. [2] Advanced Institute of Industrial Technology

      Advanced Institute of Industrial Technology

      Japón

  • Localización: Hybrid Artificial Intelligent Systems: 16th International Conference, HAIS 2021. Bilbao, Spain. September 22–24, 2021. Proceedings / coord. por Hugo Sanjurjo González, Iker Pastor López, Pablo García Bringas, Héctor Quintián Pardo, Emilio Santiago Corchado Rodríguez, 2021, ISBN 978-3-030-86271-8, págs. 3-14
  • Idioma: inglés
  • Texto completo no disponible (Saber más ...)
  • Resumen
    • In information retrieval, Latent Semantic Analysis (LSA) is a method to handle large and sparse document vectors. LSA reduces the dimension of document vectors by producing a set of topics related to the documents and terms statistically. Therefore, it needs a certain number of words and takes no account of semantic relations of words.In this paper, by clustering the words using semantic distances of words, the dimension of document vectors is reduced to the number of word-clusters. Word distance is able to be calculated by using Word-Net or Word2Vec. This method is free from the amount of words and documents. For especially small documents, we use word’s definition in a dictionary and calculate the similarities between documents. For demonstration in standard cases, we use the problem of classification of BBC dataset and evaluate their accuracies, producing document clusters by LSA, word-clustering with WordNet, and word-clustering with Word2Vec.


Fundación Dialnet

Dialnet Plus

  • Más información sobre Dialnet Plus

Opciones de compartir

Opciones de entorno