Ayuda
Ir al contenido

Dialnet


Resumen de Design, analysis, and implementation of advanced methodologies to measure the socio-economic impact of personal data in large online services

Jose González

  • In the Internet era, online services and social networks have changed the marketing ecosystem we used to know. Not so far away, a few years back in our recent history, the advertising outlets mainly were reduced to television, radio, or roadside billboards. However, in recent years, a new and more extensive advertising ecosystem in terms of reachability and revenue has appeared: online advertising.

    Online advertising takes a substantial advantage versus traditional advertising markets, the possibility to reach users based on their interests. In traditional outlets, little information was known from the user themselves. Instead, the marketing strategy was based on where to locate the ad or to select a specific time frame for a particular product. However, on the web, advertising becomes much more individualized thanks to collecting and processing tons of individual data. Therefore, users get more easily attracted to the offered products since they are more likely to match their preferences. Furthermore, this change to the advertising paradigm as we used to know allows advertisers to create a narrow segmentation focused on specific commercial purposes. The existence of significant amounts of data and information collected from browsing behaviors and patterns from users worldwide is the key for advertisers to reach the best audiences for their campaigns faster.

    Not surprisingly, the online advertising ecosystem is an industry that yearly revenues billions of dollars. The most recent Internet Advertising Revenue Report from the Interactive Advertising Bureau (IAB) reveals that the amount of money online advertising generated solely in the US is up to $139.8B in 2020. This revenue represents an increase of 12.2% from 2019 ($124.6B), even after the COVID-19 pandemic. This year after year revenue increase highlights the power of online ads. It also demonstrates the collection of large amounts of data from the user behavior on the web, and more importantly, the commercialization of such data, often with little or no knowledge of the user.

    The businesses dominating the online advertising sector in terms of revenue correspond to Internet Big Tech corporations like Google, Facebook (FB), or Microsoft. Nevertheless, not only do these significant businesses profit from the use of online advertising, but many other enterprises get their piece of the cake. Additionally, online sites, such as newspaper businesses, reinvented themselves to create online sites associated with the brand and rely mainly on online ads (due to the drop of physical newspaper sales) to finance their business.

    Besides, the revenues derived from online advertising suggest that everyone taking part in this ecosystem is satisfied with online ads. On the one hand, advertisers can create campaigns narrowed to specific audiences. On the other hand, online businesses and websites monetize their activities. Finally, the Big Techs get vast amounts of revenue each year, acting as intermediates on this process by providing advertising services or commercializing data.

    As stated before, online advertising offers much more personalized products due to personal information that travels around advertising exchanges and companies. Although the online advertising ecosystem is much more complex, a simple breakdown of the process would be the following. When users receive an ad on their preferred social network feed or web page, this ad has previously overcome a bidding process where several advertisers bid for this particular user to make their ad appear to them. The user appears in several audiences composed of a subset of the things surrounding their individuality on the Internet. For example, they live in a particular location, like a particular kind of music, own a specific mobile device, and go to some school. This data from users is traded, exploited, and commercialized to create significant revenue for this ecosystem.

    In other words, on top of that, the online advertising business is built upon the intangible value of personal data. From personal information, audiences and profiles are created to show ads while the user is online. Still, users get little information on how their data is being used, for what purposes, or its inherent value. From this point of view, there is an absolute opacity in this market where the user (their data) is the final product traded, and still, users are unaware of this process. In other words, the online advertising business nowadays is based on commercially exploiting and processing users' privacy.

    Even more, the most crucial spark in terms of privacy came with the Facebook and Cambridge Analytica scandal. A third-party app used the social network company to extract information from 87 million users without their permission. This situation urged the need to improve the way personal data was used, and users' concern for their data has increased over the years. Several initiatives, apps, and laws started to be created to create awareness on users, give them the rights to control their data, and protect them against privacy risks. One of them is the General Data Protection Regulation (GDPR) that entered into force in May 2018 for European Union (EU) member states. This dictation is used as a reference in this thesis since it is the one affecting the highest number of countries, and therefore, a vast number of users.

    The storage and exploitation of personal information with greedy interests open a new paradigm for individuals. The growth of Internet usage generates enormous amounts of data linked to the user that needs to be protected. The primary objective of this thesis is to shed some light looking for transparency in the use of personal information. Crucial questions are addressed focusing on online advertising, which, as said before, represents the most important source of revenue for most online services.

    This thesis provides a completely different perspective by incorporating a novel methodology in the area of ICT. One of the main objectives is increasing awareness and foment transparency so users can know their data's economic value and social impact. It will empower users by creating transparency and awareness among them to make informed decisions on how to use online services depending on the use these services do of their personal information. In particular, this thesis mainly focuses on Facebook, one of the predominant players in this business in terms of revenue, generating more than $84B online in advertising revenue and having 2.3B monthly active users in 2020.

    This thesis contributes with the creation of a data valuation tool that provides Internet users with data values aligned to actual market prices, personalized feedback per user to let each user know an estimate of how much money they are generating from their personal information, and real-time information of the value generated over time.

    The Data Valuation Tool for Facebook Users (FDVT) is a disruptive approach aimed at determining the economic worth of users' personal data in real-time and customized based on their profiles. Since skilled Internet users are unaware of the value derived from their data, the FDVT aims to provide with this estimation focusing on the revenue from online advertising on one of the most popular services: Facebook. Facebook obtains the major part of its revenue from customized advertising. Note that the methodology presented in this thesis can be extrapolated to other services. The FDVT is a Google Chrome and Mozilla Firefox extension providing to the research community a novel approach in the field of online transparency and privacy. It provides real-time personalized economic estimation and, as this thesis covers, it empowers users against risks derived from the use of personal data for advertising purposes.

    As presented before, top-rated online services build their business model upon the commercial exploitation of personal information. The irruption of these services has raised a very intense debate around questions like the ethical and legal boundaries of the management of personal information. In recent years, successive privacy and data leaks have put privacy in the spotlight. Nowadays, privacy is becoming a critical aspect between regulators and civil society. Such example is creating new laws that aim to protect the user against malicious uses of their personal information. The aforementioned GDPR in the EU is an example of how these increasing concerns have been converted to legislation.

    The GDPR is the reference for this thesis because it affects a large number of nations, individuals, and businesses. The GDPR aims to protect the user against the misuse of the commercialization of personal data for advertising purposes. In this context, it defines some categories of data as sensitive, and it prohibits their use with limited exceptions, one of them being that users give explicit consent for this kind of data to be used. More specifically, the GDPR defines as sensitive data “data revealing racial or ethnic origin, political opinions, religious or philosophical beliefs, or trade union membership, and the processing of genetic data, biometric data for the purpose of uniquely identifying a natural person, data concerning health or data concerning a natural person's sex life or sexual orientation”.

    Therefore, because of the legal, ethical, and privacy concerns of processing sensitive personal data, it is critical to understand if online services are economically exploiting such sensitive information. If this is the case, it is also critical to estimate the number of users (or citizens) who may be harmed due to the exploitation of their sensitive personal data. The content of this thesis provides the research community with quantification on the number of FB users (the largest social network in terms of users) that are affected by the exploitation of their personal information for advertising purposes.

    Facebook is one of the most relevant businesses that gathers information and profits from users' behavior on their social platform. On Facebook, users are identified inside the platform with ad preferences (or interests) in order to tailor them with personalized advertising. These ad preferences relate to ideas or things that users may like, and they are later proposed to advertisers as a tool to reach a more suitable audience. As a result, this information builds a unique profile around the user, including the things they like or their habits.

    Some of these ad preferences imply political beliefs, sexual orientation, personal health, and other potentially sensitive characteristics. The apparition of the GDPR establishes a formal definition of sensitive data and motivates a new field of research to bring to light that the use of sensitive attributes for advertising is not to be despised.

    By using the FB Ads Manager, it is possible to gather the number of users within an audience of a particular interest or set of interests, and therefore, the number of real users on FB being labeled with those interests in their profiles. This thesis analyzes the impact of this problem in the EU and over 197 countries worldwide using real interests assigned to real FB users. It also studies whether the enactment of the GDPR had any impact and helped to put a stop to FB in the use of sensitive data for advertising. Later, it provides a discussion regarding the implications and risks derived from the commercialization of such kind of data, and finally, a technical solution is presented as an attempt to create awareness, transparency and empower users with the possibility to know and remove those ad preferences that may be linked to sensitive information.

    Furthermore, one of the things associated with personal data is that it can be linked to the user, even when aggregated and anonymized. Contrarily to Personal Identifiable Information (PII) that allows anybody with access to such information to identify and contact an individual immediately (for example, official IDs, passport numbers, email addresses, or phone numbers), non-PII can not solely identify an individual. For this reason, in the context of privacy, the research community is working to determine how many elements of (in theory) non-PII information are necessary to reveal the identity of a unique user in a given dataset.

    The content of this thesis contributes to the research community by analyzing the Facebook dataset, one of the largest datasets that exist nowadays. Facebook user database is formed by more than 2.8B monthly active users, and therefore, modeling the possibility to reach a single user among this database is exciting. As stated before, it is important to remind that the foundation of Facebook's economic strategy is advertising. As a result, everyone on FB has a list of ad preferences. Ad preferences (or interests) correspond to non-PII information since they can not identify the user alone.

    This thesis uses FB to reveal the number of non-PII items that unequivocally identify a user. For this, the analysis relies on real interests assigned to FB users. A model is built to derive, in a systematic way, the number of interests and probability to uniquely reach one user on FB. After that, an experiment is presented to prove the feasibility of building an advertising campaign using non-PII information that targets a unique user exclusively. This action is referred to as Nanotargeting. The purpose of this work is to achieve the first evidence that non-PII data can be systematically used for nanotargeting. Finally, the risks associated with nanotargeting in this context are discussed, followed by easily implementable solutions to prevent it.

    Finally, the last contribution of this thesis comes from the unexpected COVID-19 outbreak in 2020 and the following lockdown and new normality lifestyle. A side contribution is presented to better help to understand the online advertising ecosystem. The technology developed in previous works of this thesis has partially helped to contribute to the COVID issue in two specific studies.

    The COVID-19 outbreak provides a chance to investigate the Internet's resistance to an unprecedented event that severely affects its financial backbone, online advertising. This thesis presents the study of the online advertising ecosystem from a complete novel angle, analyzing the relationship between online advertising supply and the resilience of the open Internet. To this purpose, this thesis first leverages the study of Price Elasticity of Supply (PES) with the exploitation of datasets from the online advertising ecosystem. PES is an economic metric that assesses the responsiveness of the amount supplied to price changes. Finally, it provides insights on the distribution changes on advertising categories on the web.

    Moreover, one of the challenges to stop the spread of the COVID-19 pandemic is to be able to identify the user exposure to infected contacts. Governments and businesses have joined forces and put all their efforts into successfully identify potentially infected contacts and alert those citizens who have been exposed to the virus. Contact tracing individuals is taken as one of the most critical approaches to stop the transmission of COVID-19.

    Research found that manual tracing was insufficient and advocated for the adoption of digital contact tracing systems capable of utilizing large-scale location data. With this purpose in mind, governments started to develop new apps based on Bluetooth technology to identify users' mobility and being able to alert when they have been exposed to an infected individual. Therefore, the critical component for its effectiveness is to get a high amount of individuals that use the digital contact tracking system. This thesis analyzes the numbers and adoption rates from the new contact tracing apps. This rough analysis is very illustrative to understand that in the vast majority of countries, the adoption was not enough to fight the pandemic efficiently.

    This thesis proposes a new protocol for contact tracing users in exceptional situations like this. The solution proposes an alternative approach to eliminate the complexity of achieving an extensive mobile app adoption. The proposed solution relies on existing database information from apps and browsers with a substantial adoption rate in many countries (for instance, location data stored by Big Tech companies like Google or Facebook). A comparison between Bluetooth apps and FB or Android adoption rates for several countries supports the fact that large dataset information from apps, devices, and browsers from Big Techs would be a much better proxy to fight the spread of the pandemic. For instance, their penetration is higher than 50% (active users) in most EU countries. This implies that those companies will have a massive amount of geolocation information that can be used for contact tracing purposes. For this reason, this thesis encourages the use of the presented protocol relying on personal information to help overcome the spread of this virus by tracing contacts that may have been exposed to people having the COVID-19 disease in a privacy-preserving manner.


Fundación Dialnet

Dialnet Plus

  • Más información sobre Dialnet Plus