Ayuda
Ir al contenido

Dialnet


Advanced methods to audit online web services

  • Autores: Pelayo Vallina Rodríguez
  • Directores de la Tesis: Antonio Fernández Anta (dir. tes.), Rubén Cuevas Rumín (codir. tes.)
  • Lectura: En la Universidad Carlos III de Madrid ( España ) en 2023
  • Idioma: español
  • Tribunal Calificador de la Tesis: Nikolaos Laoutaris (presid.), Hamed Haddadi (secret.), Leyla Bilge (voc.)
  • Programa de doctorado: Programa de Doctorado en Ingeniería Telemática por la Universidad Carlos III de Madrid
  • Materias:
  • Enlaces
  • Resumen
    • Online web services have grown dramatically in size and diversity in the last years, becoming essential components of our daily life and allowing us to conduct elementary tasks like working, getting informed, or keeping in contact with relatives and friends.

      However, all the changes and evolution experimented on by the online web services had not have been possible without implementing a profitable economic model that sustains it. Despite a suitable percentage of these services being fee-based, they represent a lucrative business that generates billions of dollars, allowing the creation of some of the biggest companies in the world in terms of market capitalization, like Alphabet Inc. or Meta Inc. (Previously known as Facebook Inc.). Being costless and lucrative is possible due to an advertising-based monetization model, which consists of delivering ads to the users in exchange for their services (\eg Facebook or YouTube).

      Although online advertising dates back to the middle of the 90s, its popularity has experienced an increase among brands and advertising agencies in the last decade, mainly due to its capacity to reach precise audiences at a low cost.

      Converting online web services into advertising walls is a double-edged sword for the users. The capacity offered by online advertising to segment their audiences requires a massive collection of personal data from the users, including their web browsing histories or even more invasive data such as age, gender, or location to infer the online profile of the users. This data collection is possible due to implementing a complex tracking ecosystem by online advertising companies from which multiple stakeholders collect, process, and exchange information. The many privacy cases of abuse inflicted by this industry motivated the implementation of new regulatory efforts to protect consumers' privacy in the last years. Some notable examples are the General Data Protection Regulation (GDPR) in the European Union or the California Consumer Privacy Act Regulations (CCPA) in California, USA. Further, these privacy regulations typically contain specific provisions and strict requirements for websites that provide sensitive material to end users, including sexual, religious, and health services.

      Implementing new regulatory frameworks, alongside the growth of online web services, forces an endless evolution of current techniques to study and audit online web services. Furthermore, there is a need to emphasize the online advertising ecosystem, as it represents the primary economic support of a high percentage of web services. Also, the activities and abuses conducted by this ecosystem drove the implementation of current privacy regulations to control the use and collection of personal data.

      This dissertation falls within the topics of Internet measurements, tackling the need for new measurement techniques and methodological approaches to audit and study online web services. These efforts want to increase the limited knowledge about web subsystems offering sensitive material, including their regulatory compliance regarding current privacy regulations. Also, this dissertation tackles the need to study and measure how big ad tech companies create and use the online profiles of their users to distribute tailored ads. Furthermore, the work presented in this dissertation raises the need for a more in-depth understanding of fundamental tools for conducting Internet measurement works, including their limitations and suitability for academic research. Specifically, this dissertation presents three main contributions:

      The first one corresponds with implementing a novel methodology to audit sensitive web services' privacy, transparency, and regulatory compliance. We validate our method by looking at pornographic websites concerning the GDPR in the European Union. We focus our analysis on such types of websites for two main reasons: 1) the GDPR establishes specific provisions and strict requirements on sensitive websites, including pornographic ones.

      2) big ad tech companies set strict constraints for porn-related publishers. As a result, it opened new market opportunities for other actors who have specialized in advertising and tracking technologies for adult sites, creating a semi-decoupled ecosystem from the rest of the web.

      We perform a holistic analysis of over 6,843 pornographic websites, finding a prevalent absence of regulatory compliance and very extended use of tracking techniques, including advanced ones such as fingerprinting. These results stress the importance of studying the World Wide Web subsets that have not been scrutinized by regulators, policymakers, and the research community in depth.

      Second, we empirically and comprehensively analyze 13 domain classification services to study their labeling strategy and performance. These services have multiple applications, from business applications such as online advertising to academic research works to conduct category-dependent measurements or to identify the purpose of a website or online service. We study each domain classification service's methodologies, scalability limitations, label constellations, and suitability for academic research studies. In some cases, their findings depend on the results provided by the domain classification services. We find that the limitations and shortcomings of each domain classification service heavily affect their suitability and applicability, both for practical solutions and academic studies.

      In the third and last contribution, we implement a novel methodology with real users to study the performance and quality of the profiling and ad targeting algorithms from the two most important stakeholders in the online advertising business, Google and Meta (previously Facebook). We find that half of the categories associated with the profiles are incorrectly assigned. We also observe the presence of sensitive categories in Facebook users, posing a privacy risk and potential regulatory noncompliance.

      In summary, this dissertation brings new methodologies and results to increase our limited knowledge about the web.


Fundación Dialnet

Dialnet Plus

  • Más información sobre Dialnet Plus

Opciones de compartir

Opciones de entorno