Ayuda
Ir al contenido

Dialnet


Query expansion by relying on the structure of knowledge bases

  • Autores: Joan Guisado Gámez
  • Directores de la Tesis: Josep Larriba Peu (dir. tes.)
  • Lectura: En la Universitat Politècnica de Catalunya (UPC) ( España ) en 2017
  • Idioma: español
  • Tribunal Calificador de la Tesis: Víctor Muntes-Mulero (presid.), Ricard Gavaldà Mestre (secret.), George H.L. Fletcher (voc.)
  • Materias:
  • Enlaces
    • Tesis en acceso abierto en: TDX
  • Resumen
    • Query expansion techniques aim at improving the results achieved by a user's query by means of introducing new expansion terms, called expansion features. Expansion features introduce new concepts that are semantically related with the concepts in the user's query and that allow retrieving documents that otherwise would be not. Thus, the challenge is to select those expansion features that are capable of improving the results the most. A bad choice of expansion features may be counterproductive. In this thesis, we use an external source of information, a Knowledge Base (KB), as source expansion features.

      A knowledge base consists of a set of entries, each of which represent a concept and has, at least, a name, which can be used as expansion feature. The techniques framed in this family have become more popular due to the increase of available data, as, for example, Wikipedia. Particularly, we focus on exploiting those KB whose entries are linked to each other, conforming a graph of entries. To the best of our knowledge, most of the techniques framed on the KB family rely on some kind of text analysis, such as explicit semantic analysis, or are based on other existing query expansion techniques such as pseudo relevance feedback. However, the underlying net-work structure of KBs has been barely exploited.

      In this thesis, we show that the structure can be used to identify reliable expansion feature for the query expansion process. Thus, we design a novel expansion technique, Structural Query Expansion (SQE). For SQE to benefit from the particular structures of KBs, we propose a methodology to identify the structural characteristics that, given a query, allow identifying those nodes in the KB that are good candidates to be used as source of expansion features, called from now on expansion nodes. The methodology consists in building a ground truth that connects each query from a query set with those nodes of the KB that when used to extract the expansion features allow achieving the best results in terms of precision, we call the set of those nodes, expansion query graph. Then, we compare the expansion query graph of each query to find shared characteristics. SQE materializes the revealed characteristics into a set of structural motifs. In the particular case of Wikipedia, we have found two motifs called triangular and square. In the former, the query node and the expansion node are doubly linked and the expansion node belongs to, at least, the same categories as the query node. In the latter, the query node and the expansion node also are doubly linked and their categories are connected somehow. These motifs are used to, given a query and its query nodes, identify all the expansion nodes which are used as source of expansion features. Notice that we have designed this technique to be orthogonal to others because is fully decoupled from the search process and does not depend on the particular collection of documents.

      We have tested our techniques with three different datasets to avoid any kind of overfitting. The results are shown to be consistent among the three of them. Also, the results which are validated with statistical significance tests, show that SQE is capable to achieve up to 150% improvement in the precision. Finally, we show the performance of our technique which runs in sub-second times (358.23ms at maximum) which makes it feasible for a real query expansion system. This is especially relevant because, to the best of our knowledge, the performance is an aspect that is being ignored in most of the works and, thus, it is difficult to know whether they can be include in real systems or not.


Fundación Dialnet

Dialnet Plus

  • Más información sobre Dialnet Plus

Opciones de compartir

Opciones de entorno