Ayuda
Ir al contenido

Dialnet


An intelligent extension of the training set for the Persian n-gram language model: an enrichment algorithm

    1. [1] University of Isfahan

      University of Isfahan

      Irán

  • Localización: Onomázein: Revista de lingüística, filología y traducción de la Pontificia Universidad Católica de Chile, ISSN 0718-5758, ISSN-e 0717-1285, Nº. 61, 2023, págs. 191-211
  • Idioma: inglés
  • Enlaces
  • Resumen
    • In this article, we are going to introduce an automatic mechanism to intelligently extend the training set to improve the n-gram language model of Persian. Given the free word-order property in Persian, our enrichment algorithm diversifies n-gram combinations in baseline training data through dependency reordering, adding permissible sentences and filtering ungrammatical sentences using a hybrid empirical (heuristic) and linguistic approach. Ex-periments performed on baseline training set (taken from a standard Persian corpus) and the resulting enriched training set indicate a declining trend in average relative perplexity(between 34% to 73%) for informal/spoken vs. formal/written Persian test data.


Fundación Dialnet

Dialnet Plus

  • Más información sobre Dialnet Plus

Opciones de compartir

Opciones de entorno