Ayuda
Ir al contenido

Dialnet


Metadata-driven data integration

  • Autores: Sergi Nadal
  • Directores de la Tesis: Albert Abelló Gamazo (dir. tes.), Oscar Romero Moral (codir. tes.), Stijn Vansummeren (codir. tes.)
  • Lectura: En la Universitat Politècnica de Catalunya (UPC) ( España ) en 2019
  • Idioma: español
  • Tribunal Calificador de la Tesis: Robert Wrembel (presid.), Eduardo Mena Nieto (secret.), George H.L. Fletcher (voc.)
  • Programa de doctorado: Programa de Doctorado Erasmus Mundus en Tecnologías de la Información para la Inteligencia Empresarial / Information Technologies for Business Intelligence por la Universidad Politécnica de Catalunya; Aalborg Universitet(Dinamarca); Politechnika Poznanska(Polonia); Technische Universität Dresden(Alemania) y Université Libre de Bruxelles(Bélgica)
  • Materias:
  • Texto completo no disponible (Saber más ...)
  • Resumen
    • Data has an undoubtable impact on society. Storing and processing large amounts of available data is currently one of the key success factors for an organization. Nonetheless, we are recently witnessing a change represented by huge and heterogeneous amounts of data. Indeed, 90% of the data in the world has been generated in the last two years. Thus, in order to carry on these data exploitation tasks, organizations must first perform data integration combining data from multiple sources to yield a unified view over them. Yet, the integration of massive and heterogeneous amounts of data requires revisiting the traditional integration assumptions to cope with the new requirements posed by such data-intensive settings.

      This PhD thesis aims to provide a novel framework for data integration in the context of data-intensive ecosystems, which entails dealing with vast amounts of heterogeneous data, from multiple sources and in their original format. To this end, we advocate for an integration process consisting of sequential activities governed by a semantic layer, implemented via a shared repository of metadata. From an stewardship perspective, this activities are the deployment of a data integration architecture, followed by the population of such shared metadata. From a data consumption perspective, the activities are virtual and materialized data integration, the former an exploratory task and the latter a consolidation one. Following the proposed framework, we focus on providing contributions to each of the four activities.

      We begin proposing a software reference architecture for semantic-aware data-intensive systems. Such architecture serves as a blueprint to deploy a stack of systems, its core being the metadata repository. Next, we propose a graph-based metadata model as formalism for metadata management. We focus on supporting schema and data source evolution, a predominant factor on the heterogeneous sources at hand. For virtual integration, we propose query rewriting algorithms that rely on the previously proposed metadata model. We additionally consider semantic heterogeneities in the data sources, which the proposed algorithms are capable of automatically resolving. Finally, the thesis focuses on the materialized integration activity, and to this end, proposes a method to select intermediate results to materialize in data-intensive flows. Overall, the results of this thesis serve as contribution to the field of data integration in contemporary data-intensive ecosystems.


Fundación Dialnet

Dialnet Plus

  • Más información sobre Dialnet Plus

Opciones de compartir

Opciones de entorno