MTurk vs. Qualtrics vs. Prolific: Whose survey participants are best?
Over the past decade, there’s been a surge in the amount of research taking place online. From 2014 to 2019 alone, online research as a share of market research spend increased from 28% to 44% worldwide. And the availability of online research panels and platforms has grown as a result. Unfortunately, the quality of responses gathered from each varies wildly. And that’s concerning.
Because whether you’re working in academia or at Alphabet, data quality is of fundamental importance to researchers. Especially when you’re paying for it. So, we compared the quality of survey responses between our own platform, Prolific, Amazon’s Mechanical Turk (MTurk), and Qualtrics.
Details of the preliminary survey and the two studies done were recently published as a whitepaper online. But, to help answer the question of who provides the best online survey responses, we’ve summarized our findings below.
Before comparing MTurk vs. Qualtrics vs. Prolific, we needed to decide which factors of quality mattered most to researchers who pay to source data online.
To do so, we created a quality-specific survey. In it, we detailed 11 distinct factors related to online research response quality. Participants were asked to rank each of the 11 factors based on their opinions and experience, from 1 (Not Important) to 5 (Highly Important).
We then published the survey through the Society for Judgment and Decision Making’s distribution list and our own social channels. In just a few hours, we had 129 results, with respondent expertise ranging from academia and psychology to business and marketing and averaging five years of research-related experience in their roles.
From this survey, we learned the top four most-valuable factors to researchers are participant attention, comprehension, honesty, and reliability.
High-quality survey participants read questions before answering them, and measures of attention determine if this is, in fact, happening. Frequently, measures of this factor involve the use of attention-check questions (ACQs).
High-quality survey participants understand what they’re being asked to do. But, when measuring comprehension as a factor of data quality, participants also need to show they can summarize what they’re being told and communicate it back to the survey requesters.
In modern research, compensating survey participants is important when the goal is to source high-quality data. But researchers need to ensure compensation doesn’t influence how participants answer survey questions. Especially since certain questions, like those relating to demographic information, are often used to determine who’s eligible to take the survey itself.
Finally, in order to trust the quality of the responses relating to more subjective data, like personal traits, tendencies, and preferences, researchers need participants who respond consistently over the course of a survey. Due to this subjectivity, measuring respondent reliability requires more than a couple of ACQs. This is why a combination of methods, like Cronbach’s alpha, NFS scoring, and the DOSPERT scale, are often used to measure internal reliability.
We then worked to compare the quality of MTurk vs. Qualtrics vs. Prolific. In doing so, we determined responses sourced through the Prolific platform were highest in quality overall. Prolific respondents also performed best in three of the four quality factors identified in our researcher survey.
Overall, Prolific respondents:
Breaking results down for each of the four factors of quality tested provided additional insights:
Respondent Quality: Attention
To test attention, we included two distinct ACQs in the survey. The results showed statistically significant differences between the three audience samples.
As detailed below, the Prolific respondents had the highest rate of passing ACQs, with 68.7% of respondents successfully passing both. MTurk respondents came in second, with 46.6% passing both ACQs, followed by 22.5% for the Qualtrics respondents.
Respondent Quality: Comprehension
The survey also included two separate questions to test comprehension. For the first, 82% of Prolific respondents answered correctly. Roughly half of MTurk and Qualtrics audiences did the same.
For the second question, 98% of the Prolific respondents answered correctly, compared to Qualtrics’ 87% and MTurk’s 69%. Taken together, quality comprehension for respondents was 81% for Prolific, 52% for Qualtrics, and 42% for MTurk.
What’s more, we wondered if there was a correlation between respondent attention and comprehension. And, according to our data, there is. Respondents who passed both ACQs did better on the comprehension questions that followed compared to those who didn’t. And of those who didn’t, it was the MTurk respondents whose comprehension suffered the most in this way.
Respondent Quality: Honesty
To test honesty, the quality survey also featured five matrix problems, the fifth of which was unsolvable. Participants were asked to try and solve each problem and told the total number solved would affect a compensation bonus. To collect the bonus, each respondent self-reported how many problems they’d solved.
As a whole, MTurk respondents claimed to have solved the most problems on average (54.5%). Prolific respondents claimed an average of 36.9% of all problems solved. And Qualtrics averaged 32.6%. While the overall results are interesting, a focus on that fifth, unsolvable problem informs our understanding of honesty.
Even though it kept them from receiving the full bonus, 84% of Prolific respondents reported they couldn’t answer the fifth question. The Qualtrics respondents followed at 78%. And only about half of the MTurk respondents (55%) were honest about their efforts on the fifth problem.
The data collected also shows that success with ACQs (high respondent attention) also correlates to respondent honesty. As shown below, honesty was higher for those who passed both ACQs, as compared to those who failed one ACQ or both.
Respondent Quality: Reliability
Initially, our data showed that internal reliability for respondents was high across all three platforms. But applying the DOSPERT scale revealed respondent discrepancies relating to attention.
MTurk respondent reliability was consistent with whether each passed or failed our two ACQs. However, for Prolific and Qualtrics, reliability was actually higher for those who failed their ACQs, compared to those who passed. We found this result warranted further study. As such, it became the focus of the second study on respondent quality we conducted, which we also detail fully in the final whitepaper.
Our initial study clearly showed that high attention, comprehension, honesty, and reliability were fundamental aspects of quality to experienced researchers. But at Prolific, data validity and the ability to replicate results is essential. So, the survey we designed that produced the results above had to contain questions that any reasonable survey participant (from any online research platform) could answer. Doing so ensured our MTurk vs. Qualtrics vs. Prolific respondent comparison would be fair.
Once our survey questions were complete, 500 participants were requested each through MTurk, Qualtrics, and Prolific. Participants of the quality survey all reported being 18 years old or older and living in the United States as of September 2020.
Both the MTurk and Prolific platforms allow researchers to choose how much to pay potential survey participants. For this respondent quality survey, we paid 1.5 GBP per minute on Prolific and 1.5 USD per minute on MTurk. Unfortunately, Qualtrics does not specify how it compensates participants through its platform.
In total, we received 2,857 responses across the three platforms. Of these, 2,510 (87.9%) completed our survey. And for full transparency, the following table breaks down each sample’s total size, its number of completed responses, and key demographic information.
(Note: This study was pre-registered on the Open Science Framework, and all materials and data are available here.)
The results above do serve as an endorsement of the high-quality of Prolific survey respondents. What’s more, due to the accessibility of the survey itself, the failure rates shown in the MTurk and Qualtrics results should provide a note of caution to researchers who may use their platforms.
That said, while survey respondent quality is a major aspect of online research platforms and panels, it’s not the only factor that determines the quality of the data they provide.
This is why it’s critical for researchers to understand how platforms themselves affect the quality of the data they provide.
Today Prolific is turning 5 years old – Happy Birthday to us! 🥳 It's been a remarkable journey so far. 3000+ researchers from science and industry have used Prolific last year, we have 45,000 quarterly active participants, and we've seen 200% year-on-year growth. But we're only getting started. In this post, I'll tell you a little bit about our journey, give credit where it's due!, and tell you about our exciting plans for the future.
Fresh out of YC's Summer 2019 batch, we want to share some of our most interesting learnings. If you're a startup founder or enthusiast and want to learn about product-market fit, growth experimentation and culture setting, you're in the right place!