This is an extract from our submission to the open call for evidence for the National Data Strategy. It outlines some of the things we’ve been thinking about at IF recently.

People are increasingly concerned about how organisations manage data about them. 91% of people say it’s important to be able to choose how much data they share but half (51%) can’t currently find out that information.

There’s been an increase in public debate about how organisations collect, store and use data, and a flurry of publicly published data principles. But many organisations struggle to put these principles into practice.

From data principles to practical steps

In a shifting regulatory landscape where customers care more about data privacy, we need to change how products and services are designed and built. IF curates the Data Patterns Catalogue which teams use to inform conversations about data and privacy, and make services that put people in control of data.

The data permissions catalogue (IF: CC-BY)

The catalogue has been viewed more than 10,000 times, and even got a mention in John Maeda’s Design in Tech Report 2019 for developing best practices around data collection.

Terms and conditions are compliant, but untrustworthy

Current consent mechanisms, which include Terms and Conditions and cookie banners, are not fit for purpose. Beyond being (rightly) criticised for their poor user experience, banners are often ignored. And academic research shows that most people do not read Terms and Conditions. These findings are backed up by prototyping and testing IF did with Comuzi, where young people described Terms and Conditions as inherently unfair.

Instead, consent should be distributed throughout a service. Product teams can help people understand why, how and when data collection happens.

In our work we try to show the value of data sharing upfront, before we ask people to make a decision (Sarah Gold: CC-BY)

This is an emerging area. More work is needed to develop new interaction patterns, and define how much friction to introduce into user interfaces. This means testing prototypes, taking into account users with various incentives, accessibility needs and degrees of digital literacy. Prototypes should not be exclusively screen based. Depending on the context, posters or letters can sometimes be better.

A prototype letter IF developed in a project on GDPR and data portability with the Open Data Institute (IF: CC-BY)

Ultimately, this will also require seeing the medium and longer term effect on use and engagement with digital services that are explicitly designed to help people understand how data about them is used and collected.

Recent product developments by Instagram suggests tech giants are seeing that designing for ease of use alone is inadequate. This feature encourages people to reflect before posting hostile comments:

By explicitly designing for understanding, product teams can help users make more informed decisions about sharing data. Using design patterns in this way helps educate people about their rights through the software they’re using.

Going beyond the interface to include the whole software stack

From companies’ financial records to biometric data about individuals, increasing amounts of crucial and sensitive information is being stored in cloud platforms. As a practical use case, IF built s3-monitor to record a transparency log of data held in Amazon Web Services. As an organisation, we believe companies should be investing in ways to technically prove how data is being used.

Privacy preserving technologies and methods for machine learning offer opportunities to innovate in ways that better protect personal data. More work is needed to make them mainstream, but at IF we’re already working with things like TensorFlow Federated (TFF) and Google Coral.

Specialists need better tools to monitor and audit data flows. Using a cryptographic data structure called Trillian, IF has been working on data verification. Trilian is software from Google that creates transparent, verifiable logs. Like blockchain, Trillian makes it impossible for anyone to change or delete data in the log without leaving a trace. This could provide near real-time visibility of data access and confidence in the integrity of data access logs, facilitating continuous assurance of an organisation’s data practices. It could also make it possible to involve third parties in monitoring data access, allowing for greater accountability.

Trillian uses a kind of data storage that contains representations of individual items of data called hashes. Hashing software turns unique items of data into a unique hash. The software used to create hashes makes it difficult to turn hashes back into original data. This means that representations of data can be shared openly without revealing the original contents. (David Marques/IF: CC-BY)

Machine learning makes explanations even more important

Making automated decisions easier to understand is an important part of building services that people can trust, and hold to account. IF worked with London School of Economics and Flock, an on-the-go drone insurance provider, to test new kinds of design patterns that help people understand how machine learning systems make decisions.

This prototype shows a user how different wind speeds and weather conditions could have contributed to a cheaper quote.

Using counterfactual explanations to show how different wind speeds and weather conditions could have contributed to a cheaper quote. (IF: CC-BY)

Bringing together expertise from technology companies, design and the humanities is a vital part of creating a better, more ethical information society. Working with Dr. Alison Powell and a team of researchers at LSE, designers and technologists from IF put together an exhibition about Understanding Automated Decisions in 2018, to develop a shared, interdisciplinary set of definitions about terms that are often misunderstood.

The understanding automated decisions exhibition in LSE (IF: CC-BY)

Building on this work, IF worked with LSE and Google AI to do some early prototyping about explaining machines that learn, with a particular focus on federated machine learning (a privacy preserving technology).

In federated learning, there is a local and a global model. In this interaction, two people compare local models to see what their different models have learnt (IF: CC-BY)

If the UK is looking to take a global leadership role in designing and building technology that people can trust with data, more funding is needed for projects that focus on data rights and useable, understandable privacy. Projects need to strike a balance between desk research, and prototyping and piloting with product teams, and be published in the open.

Trustworthiness without accountability is unsustainable

The power imbalance between individuals and organisations means that too much focus on “trust” could end up covering a multitude of sins.

Organisations must also be accountable in how they collect, use and store data. This means building software in ways that allows specialist individuals or civil society groups to hold them to account for how they use data.

Sarah’s written more about what we mean by accountability at IF. Keep a lookout for that post coming soon!