Over the last few weeks we’ve been thinking about improving terms and conditions. We’ve been learning by making and talking to experts.
We started out with a tool to parse and display terms
The first thing we did was build a prototype to test making it easier for people to read terms and conditions by highlighting certain words and phrases, for example ‘personal information’.
This was a simple starting point that could lead to more sophisticated natural language processing and automatic analysis of terms.
Simple formatting was the most impactful part of the prototype
The biggest surprise testing the prototype was that the most popular feature was its clear headlines, large font sizes and ample spacing.
This basic level of usability is often missing — terms and conditions documents often reflect their heritage as paper documents. Simply presenting documents with semantic HTML — correct use of different levels of headings, for example — can improve their readability.
A lack of standards makes parsing terms difficult
Through building and testing the prototype we learned a lot about how terms and conditions are published and the challenges of this type of tool.
The first hurdle for parsing tools is a lack of consistency of how terms are published. We encountered two-column PDFs, text files and low quality HTML web pages.
The lack of standards makes it difficult to extract the structure of the content reliably, and that’s the first stage of doing meaningful parsing.
There are different ways to tackle this problem:
- Spend engineering effort on parsing different types of format, automatically recognising headings, definitions and so on
- Influence the way terms are published in the first place
At this point it feels like there’s more potential looking at the content of the terms rather than automatically importing different formats.
We’ll definitely revisit parsing, but next we’ll talk through ways we might improve the publishing of terms and conditions.