“How might we use data about how people move to support the response to COVID-19?” is a pressing question. For instance, researchers could use movement patterns to build and validate epidemiological models. And health service staff might use an individual’s movement data to help trace contacts. These activities are valuable, but when data collected with individual consent is reused for the public good, it challenges that model of consent.

Many factors influence whether data can and should be used to help solve a problem. Only a fraction of a population can ever be represented through data, and all data contains inaccuracies. These problems can make a promising idea unviable. And there may be opportunity costs in prioritising data analysis over other efforts, making a viable idea undesirable. Researchers should first explore these factors through models and simulated data, to avoid the unnecessary handling of sensitive data.

If the use of location data seems both viable and desirable, we’re then faced with the challenge of using data in a responsible manner.

Pattern of transmissions of COVID-19 around the world. Hadfield et al., Nextstrain: real-time tracking of pathogen evolution, Bioinformatics (2018)

Some technology companies frequently record a user’s precise location within their products. That’s possible because people consent to the collection and use of that data when they first start using a product, with few constraints around what the company can use that data for.

In practice, organisations generally collect data to improve the experience of a product, or the revenue it makes. Google, for example, uses the location data it collects to tell you when roads and restaurants are busy, and improve advertising revenue. Occasionally, however, data is repurposed for unrelated efforts: Google also uses location data for fundamental research in unrelated fields such as urban form.

Given the uniqueness and sensitivity of commercially collected historical data, there’s an important question: who gets to use the data, and for what purpose? In our current model, individuals consent for the company collecting data to decide on their behalf. That company decides how the data is used, typically with weak guarantees, and an opaque decision making process.

Individuals often let companies decide how their data is used passively, by accepting the defaults offered within products without considering the implications. Google’s Material Design framework for Android states: “Permission requests should be simple, transparent, and understandable” for “best practices of user interface design”. This is an admirable aim. But in practice that framework doesn’t acknowledge the complexity of making the flow of data understandable to people. Nor does it acknowledge the incentives influencing the data collection itself. Software systems are complex and hidden, spanning many organisations. Despite this, we ask individuals to make decisions about how their data is to be used when they can only dimly view the very surface of that system.

When giving consent, individuals only have a limited view of how organisations use data about them

This tenuous thread of consent makes companies conservative towards using data for purposes beyond increasing revenue, while pursuing the latter aggressively. It generally rules out use of data for societal benefit, such as public health. Without a strong basis of consent, each use of data by a company carries risk. Unless that risk is balanced by something fundamentally important to the company - profit, for example - that use is not seen as viable. If society wants to enable the use of such data for societal benefit, our current approach to consent needs to change.

In this post, we’ve talked about how our current model of individual consent causes problems when we try to reuse data for the public good. In our next post, we’ll look at how the properties of data itself create problems for our current consent models. The final post will talk about collective consent.