How to improve the quality of data received from your online survey responses
You’ve heard the maxim about data quality: Garbage in, garbage out. It doesn’t just apply to data analytics or machine learning. It’s true for your online surveys too. You don’t want to know just how many of your current online survey respondents are bots. We’re going to tell you anyway, but you don’t want to know.
If you spend a lot of resources gathering data from online surveys, you need that data to be of consistently high quality. You can find yourself making very expensive mistakes if you start allocating resources based on low-quality survey data.
Here’s how to assess the quality of your survey data quality and improve the overall quality of the responses you’re getting to your online surveys.
Data quality is a measure of how usable the data is for its intended purpose.
There are numerous factors that contribute to high data quality. High quality data is, among other things, reliable, replicable, and thorough. Low-quality data may be incomplete or unrepresentative of the cohort that volunteered it.
The quality of your data governance and data cleansing practices will partly determine your company’s overall data quality. It all fundamentally starts with your data sources (in this case your surveys and the participants taking them): Are they trustworthy? Are they producing the right information for you to use?
A number of studies — such as those by Chandler et al. (2015) from MTurk — have raised questions about the quality of data that comes from online surveys. That’s because bad data leads to bad decision-making that could have a significant impact on your bottom line. According to IBM (2016), the annual cost of data quality issues in the U.S. amounts to $3.1 trillion. Meanwhile the financial impact of poor data quality on organizations averages at $12.9M per year, according to Gartner research from 2021.
Bad data can lead to bad decisions by increasing the potential to come to false positive or false negative conclusions.
False positives lead you to see relationships where none exist. For instance, let’s say a company that offers time-keeping software is looking to figure out which new feature their users want next. Their users really want an in-app document signing feature, and are willing to pay for it. But poor survey design could very easily lead the company to believe that their users actually want a new reports dashboard instead, leading them to miss a huge opportunity!
False negatives lead you to miss significant relationships where they do exist. For instance, poor-quality data may lead you to believe that a particular high-value cohort isn’t interested in your product. When, in fact, higher-quality data would have revealed they are interested but require a price-point adjustment or a feature addition.
It’s all a question of where that data comes from. According to Harmon Matter Research and Grey Matter Research (2020), up to 46% of all opt-in panel respondents are bogus or low-quality. That’s right; if you use panel companies, over half of your respondents are either no good or not real.
A big contributor to this high total is not just that low quality participants are everywhere. It’s that market researchers don’t prepare well enough to deal with them. According to the same research by Harmon and Grey Matter, 95% of researchers eliminate speeders, and 90% use CAPTCHA to filter out survey bots — and thats pretty much where the defence against low quality participants/responses stops.
Here are five steps you can use to improve the quality of data you receive from online surveys.
The quality of your research platform will directly inform the quality of the data you receive from it. Whenever you’re checking out a research platform, evaluate it for the following criteria.
The underlying pool quality varies greatly between the major platforms. And we're proud that Prolific's pool scores highest on all those metrics.
If you want high data quality in your online survey responses, make sure your participants are real people providing you with real data.
You’ll have screening requirements that regulate who can participate in your survey. Ask them them at the start of your survey to answer your screening requirements. This way, you can confirm that participants’ prescreening responses are accurate and up-to-date.
You should never reveal who you want to respond to your survey (i.e., your required demographics) in your survey description. Do so, and you risk biasing your own survey results by telling low quality participant who they need to pretend to be to get access.
You also ideally want respondents who have higher rates of unbiased responses, greater expressivity (a wider range of vocabulary and willingness to give longer answers), and who are more likely to finish the surveys they start.
You won’t get worthwhile answers to all your questions from all available samples. Much older people are probably going to be unlikely to give you high-quality data about customer service chatbots (as they statistically use them less). Inner-city folks renting their apartments aren’t going to be able to tell you too much about preferred mortgage plans.
So create a clear picture of your research aim, and build out a sense of what sample is likely to be able to give you the data to fulfill those aims. You need to ask the question: generalizability vs. nicheness — which one is more important to you? Generalizable samples give you the ability to generalise to the population at large, while niche samples are more useful if you want data about a very specific subset of the population.
Linked intimately to our point about validating your participants, you need to make it as difficult as possible for bad actors to infiltrate your survey. This is primarily a technical and structural concern for when you’re building your surveys.
Include CAPTCHA as your first step when it comes to maintaining data integrity. Include free-text responses, as these will reveal bad actors (and bots) through low-effort responses. Include open-ended and duplicate questions. If you suspect bot activity, check your data for random answering patterns or careless responding.
Finally, implement effective attention checks to spot malfeasance in your database. An attention check can be as simple as including a question that simply states, “Please select 'red' from the options listed below.
There’s a human aspect to all this: If you want real people to give real answers to your survey questions, you need to treat them like real people.
One of the best ways to do this is by paying them a fair rate for their time as well as being transparent about how the data will be used and how their privacy will be protected. Tell them about the benefits they can expect for taking the time to participate.
If you’re trying to improve customer experience in your survey, try signing up as a participant in your own research. If the experience sucks, you know you’ve got work to do.
Data quality is a function of the participants and of your study design. Take care of your design and choose the right participants, and the right kind of data will come flooding in.
Never forget the key lessons underpinning it all. Pick a strong pool. Validate your participants. Choose the right sample. Work to minimize the ability for bad actors to get into the mix. And treat your participants well.
Today Prolific is turning 5 years old – Happy Birthday to us! 🥳 It's been a remarkable journey so far. 3000+ researchers from science and industry have used Prolific last year, we have 45,000 quarterly active participants, and we've seen 200% year-on-year growth. But we're only getting started. In this post, I'll tell you a little bit about our journey, give credit where it's due!, and tell you about our exciting plans for the future.
Fresh out of YC's Summer 2019 batch, we want to share some of our most interesting learnings. If you're a startup founder or enthusiast and want to learn about product-market fit, growth experimentation and culture setting, you're in the right place!