In the post “Alexa, tell me my secrets” I wrote about machine learning models that unintentionally memorise and reveal confidential and personal data.

In this post, we’re going to show how to stop people’s secrets from being memorised and shared by machine learning models. With the help of Nicholas Carlini from Google Brain and Daniel Kowatsch from Technical University of Munich, we demonstrate how a credit card number can be revealed from a neural network, and how to prevent this with Google’s new TensorFlow Privacy module. It only takes a single line of code to change the neural network from exposing secrets to protecting them.

The experiments were inspired by this paper by Nicholas and his co-authors. They’re available as a Jupyter notebook in our GitHub repository, which includes the code, data and detailed explanations.

Revealing secrets from shared models

Federated learning means people don’t need to share data with AI providers to benefit from using their services. Gboard is an example of using federated learning for keyboard suggestions. It can be used to train a generative language model collaboratively, which will then make more useful keyboard suggestions for everyone. Even though federated learning goes a long way to protect privacy, there are still risks that need to be considered.

Let’s assume Alice types “Mary had a little lamb” into Gboard. The local language model on her device would learn and improve itself from what she has typed. Bob, another smartphone user, types in: “He followed her to school one day”. His model would learn and update itself from that sentence instead. Both models learned a little bit about what the English language looks like. An AI provider would then collect their models and combine what they have learned into a better global model for Alice and Bob to use.

Gboard’s suggestions are powered by federated learning. (Source)

The AI provider can’t directly see what both have typed in. But a secret that’s been memorised by one of the models might find its way into the global model. How might this happen? Imagine Alice types “my credit card number is 1827 4740 8231 7324” into Gboard. This sentence would end up in the training data for updating Alice’s language model. If it memorises the credit card number, it might end up in the global model that’s distributed back to Alice and Bob. If Bob then happens to type in “my credit card number is”, there’s a very good chance Gboard would happily suggest Alice’s credit card number.

In reality, federated learning combines models from thousands of individual smartphone users. The aggregation step, where these individual models are combined into the global model would probably average out any memorised secrets. However, to be confident that Alice’s credit card number won’t be revealed to Bob, we have to make sure the number is not memorised by Alice’s model. You can do this using differential privacy.

How to test if a model has memorised a secret

To prove that Alice’s credit card number was memorised, we have to show that the language model reveals it after training.

After the model has been trained, we measured the reaction of the model when Alice types in the credit card number again. Another name for this reaction is perplexity. If the model memorised the credit card number, it won’t be surprised to see it again. In other words, because it already knows the number, the perplexity will be low.

If the model didn’t memorise Alice’s credit card number, we would expect its reaction to be very similar to any other credit card number.

How we revealed a credit card number from a language model

We trained Alice’s model using a standard dataset for training language models. Before training, we inserted the string “my credit card number is” followed by Alice’s secret (the actual credit card number) into the dataset. This is equivalent to Alice typing into Gboard.

We created a set of 1,000 random credit card numbers, which were not included in the training data. Alice’s model has never seen these numbers before.

For the random credit card numbers, the perplexity of the model had “normal distribution.” This means most of the values are similar, and therefore close to an average. Only a few values are very different from the average. This is a common pattern for random things like people’s height or babies’ birth weight.

If you plot “normal distribution” on a graph, it looks like this.

If the model hadn’t memorised the secret, we would expect the perplexity for the secret to fall close to the average. When plotted on the graph, it would sit around the top of the bell curve.

We tested the perplexity for our inserted secret (Alice’s credit card number) and for each of the 1,000 randomly created credit card numbers. We plotted the results on this graph:

After just one training epoch, the perplexity for our secret was very low and far away from the average around the top of the bell curve. This shows that the model is memorising Alice’s credit card number.

The perplexity for the secret moves further down the bell curve after each training epoch. This means Alice’s credit card number climbs up the top predictions in Gboard. After the second training epoch, the model would already suggest the secret after typing in “my credit card number is” 100% of the time.

How we protected the credit card number from being revealed

What can we do to prevent a local model from memorising secrets, like Alice’s credit card number? Carlini et al. have written about why differential privacy helps.

Differential privacy guarantees the model is trained to make the same suggestions whether Alice typed in the secret or not. The probability that the model would suggest the secret is roughly the same as any of the other random credit card numbers. In other words, vanishingly small. That sounds pretty much like what we need.

Google recently published TensorFlow Privacy which comes with a differentially private optimiser. The optimiser is responsible for updating the neural network during training so that it gets better over time. We reran the same experiment and simply replaced the regular optimiser we used before with TensorFlow’s differentially private optimiser.

We plotted the results again. Now you can see that the perplexity for the secret stays around the top of the bell curve. In other words, near the average for the random credit card numbers.

This means we successfully prevented the neural network from memorising the secret.

Maximising privacy while maintaining utility

As a usable security and privacy researcher, I usually have to solve an optimisation problem: how to maximise usability while minimising security or privacy loss. For most problems you need to make a trade-off.

It’s not different in this case. Since the differential private optimiser adds small portions of noise to what it’s learning, the accuracy of the model decreases. This might harm the utility of Gboard suggestions. However, Carlini et al. have shown that the accuracy loss is reasonably small. The new and improved techniques for differentially private learning will only make that better.

It’s a very good example of how we at IF think services should work: using data and AI in practical and ethical ways. It’s amazing to see that this is possible. It’s even more exciting that it works without putting any burden on people. People can happily enjoy full utility of the model and data scientists merely need to fix a single line of code in order to protect people’s privacy.