Just Google it
Ask any developer what sites they use for work and it’s a pretty safe bet that Stack Overflow will feature.
Stack Overflow is a question and answer site which uses community moderation to bubble up the “best” answer and weed out duplicates.
Typically, if I’m puzzled by something like a particular error with a software framework, I’ll Google the error, land on Stack Overflow and find a working solution to my exact problem within seconds.
Sometimes those answers come as snippets of code. That makes it incredibly easy to copy that code straight into my software and think no more of it.
Image: A screenshot of a typical answer on Stack Overflow
That’s a great user experience for developers with deadlines - what could possibly go wrong?
Let’s say I land on Stack Overflow when I’ve got a problem, and it offers me a solution. But how do I know it’s a good solution? What if it solves my immediate problem, but introduces a subtle security flaw?
How big a problem is this? Is copy-and-paste development a vector for security flaws?
Identifying insecure code
Fischer’s team used the Stack Overflow API to download all questions and answers related to Android. From that, they extracted parts which look like code snippets.
Certain Android APIs and libraries are more sensitive to security flaws such as those relating to cryptography and authentication.
In order to focus on security-related code, they only included code snippets which used one of a list of sensitive APIs and third party security libraries.
Next they needed to assess whether each of the 4,019 security-related code snippets was secure or insecure.
It would have been possible - and tedious - to assess all the snippets manually, but the team wanted a more scalable approach. Automating the process would allow them to assess any number of snippets, and potentially assess new snippets in real time.
First, they agreed on a definition of what made a snippet insecure, and two reviewers manually assessed 1,360 code snippets, labelling them as secure or insecure.
An example of an insecure snippet is one where the TLS certificate chain check is disabled, breaking the trust model of HTTPS and opening up an app to a trivial man-in-the-middle attack. It would “solve” problems relating to certificate chains, while undermining its security.
Image: A snippet of code that breaks HTTPS
With this training set, they were able to train a Support Vector Machine - a type of machine-learning classifier - to identify whether a given snippet was secure or insecure.
There’s a lot more detail in the paper on how they did the feature extraction - that’s the method for converting a code snippet into input features for the model. As with all machine learning work, the performance of a model is never 100%. They evaluated the model repeatedly at different training set sizes and achieved respectable precision and accuracy scores of over 80%.
That’s an impressive piece of work, and they’ve generously released their training data for others to use.
Finding insecure snippets in the Play Store
Equipped with a large set of insecure code snippets, how do you find out whether that code has actually made it into real-world apps?
The team deployed a partial-compilation technique to compile the code snippets into Android bytecode. With snippets of bytecode, it’s possible to directly search inside an Android app - an APK file - to see whether the insecure code is contained in the app.
That’s exactly what happened next: the team downloaded 1.3 million Android APKs from the Google Play Store and ran a search for those insecure snippets.
There are of course limitations - you can’t be sure the code is actually active in the app, but it’s an ingenious way of tracing code from Stack Overflow into real apps.
There’s a lot of insecure code
Of the 1.3 million apps studied, 200,672 (~15%) contained one of the security-related code snippets. This validates that code from Stack Overflow does get copy-pasted into real apps.
Incredibly, 198,347 (~15%) apps contained at least one insecure snippet.
The most “popular” snippet - which breaks TLS and exposes the app to a man-in-the-middle attack, was found in a staggering 180,388 (~14%) apps.
Community feedback doesn’t effectively warn users
Finally, they evaluated whether community moderation of Stack Overflow counteracts answers with insecure code snippets. They looked at the effect of upvotes, downvotes and security warnings in comments.
Image: A warning comment on Stack Overflow
Insecure answers did tend to be downvoted versus those with secure snippets, so that mechanism appears to work.
However, insecure answers with a warning comment tended to receive more upvotes than those without a warning. That would lead to a confusing situation where answers with security warnings score relatively highly - so that’s not helpful.
Strangely, answers with security warnings are copied more often into applications than those without. That applies to the most popular snippet which was seen in 180 thousand apps despite having warning comments.
From that analysis it appears that community moderation isn’t producing secure code.
It’s not just Stack Overflow
Stack Overflow is just one place where people share code, but there are many other places where developers place trust in code created by others.
I recall the chaos in early 2016 when a developer removed 250 modules from Node Package Manager (NPM), breaking thousands of others code that relied on those modules.
That incident highlighted what enormous trust developers around the world will put in one person. How many modules like that contain flaws? Would anyone notice if one was introduced?
Teams are often protective of who gets to commit to their code, yet many pull in hundreds of libraries by unknown developers all the time.
I’m a big fan of open source and sharing code as it allows great work to spread fast. However, to keep our users safe, we need to take responsibility for the code we’re giving them. In a few days, Phil’s going to write something about what we think that might look like.