Ayuda
Ir al contenido

Dialnet


Resumen de Scrapy :: Methology in Extracting User-Generated Content to Compile a Corpus from the Internet

Aroa Orrequia Barea, Cristian Marín Honor

  • Sentimental Analysis and Opinion Mining are becoming increasingly important in the field of Computational Linguistics due to the growing presence of the Internet in our lives. For this reason, in this chapter we discuss the research we conducted on technical and methodological aspects to extract user-generated content from webpages. This data is going to be used to compile a corpus of customer reviews of car rentals in Andalucía. The process has two main phases: on the one hand, the semi-automatic download of information and on the other, the cleaning of the text. For the first step, Scrapy is going to be used. It is a framework written in Python programming language, which allows users to extract and store information in different formats. In the second step, a text editor will be used to clean and format the reviews through regular expressions. The resulting texts will be suitable for specific corpus tools to study different aspects of Sentimental Analysis.


Fundación Dialnet

Dialnet Plus

  • Más información sobre Dialnet Plus