Ayuda
Ir al contenido

Dialnet


Efficient Blocking Method for a Large Scale Citation Matching

  • Autores: Mateusz Fedoryszak, Lukasz Bolikowski
  • Localización: D-Lib Magazine, ISSN-e 1082-9873, Vol. 20, Nº. 11-12, 2014
  • Idioma: inglés
  • Enlaces
  • Resumen
    • Most commonly the first part of record deduplication is blocking. During this phase, roughly similar entities are grouped into blocks where more exact clustering is performed. We present a blocking method for citation matching based on hash functions. A blocking workflow implemented in Apache Hadoop is outlined. A few hash functions are proposed and compared with a particular concern about feasibility of their usage with big data. The possibility of combining various hash functions is investigated. Finally, some technical details related to full citation matching workflow implementation are revealed.


Fundación Dialnet

Dialnet Plus

  • Más información sobre Dialnet Plus

Opciones de compartir

Opciones de entorno