Evaluating Prolific in repeated assessment and substance use research: An interview with Professor Kasey Stanton
As online recruitment becomes the norm for academic studies, researchers are looking to validate crowdsourcing methods for their field of study. Validation is critical to ensure that research using online samples can provide reliable insights that drive forward our understanding within a given field of research.
A recent paper in the journal Experimental and Clinical Psychopharmacology sought to demonstrate the utility of online recruitment, and Prolific specifically, for repeated assessments and psychometric substance use research. I discussed this work with the lead author, Kasey Stanton, PhD, a professor of clinical psychology at the University of Wyoming. You can read the full paper here.
I’m an assistant professor at the University of Wyoming. I focus on the intersection between clinical and personality psychology.
My primary interests focus on improving how we assess and conceptualize clinical topics, personality traits, and mood and how they relate to different mental health symptoms and disorders. One of the ways I do this is by seeking to understand how researchers can ask patients questions in better ways so we can get higher-quality information in research studies and when making diagnoses in clinical practice. I'm hoping to understand if we can use personality factors to help clinicians and patients collaboratively reach the most appropriate diagnoses.
Alongside this, substance use research has been a theme in research studies I’ve been involved with and in mental health diagnosis research in general. I’m really interested in how substance use co-occurs with other mental health issues. Particularly as this is very important to know when planning treatment.
Yes, absolutely. For a long time, in-person recruitment and testing were the only way to collect data, especially when testing niche populations. When I was in graduate school, online sampling was becoming more common. Amazon Mechanical Turk was the most common platform by far. But more recently I’ve noticed a large shift to Prolific and other platforms.
Although researcher opinions differ on online data collection, I think it's here to stay.
It offers a lot of accessibility to participants that we haven’t been able to study before. This is a big benefit for us as researchers. When you’re working in more rural areas in clinical practice, it can be really hard to facilitate in-person testing. Clinicians may be few and far between, and patients would need to travel large distances to attend an appointment. Online recruitment solves that problem very effectively.
In the field of substance use research, there have been a large number of papers published that have used online recruitment methods. But most of those have been cross-sectional. We wanted to evaluate online recruitment methods for different types of longitudinal study designs with a specific emphasis on examining retention and representativeness of the online sample.
To test this, we examined if data collected from Prolific would conform to general theoretical expectations around consistency of mood and substance use across time. We also evaluated how likely participants would be to stick with the study across time.
A side aim of the project was to show how you can use Prolific for measurement and assessment research on specific samples, such as those with mental health issues. This is important because you often need really large samples to ensure that your questionnaires and study items are applicable to people from a lot of different backgrounds. Online data collection lets researchers do this very efficiently.
We looked into multiple options for recruitment sources for this paper. We asked several colleagues for recommendations and Prolific was a name that came up consistently.
When we investigated the platform we saw the screening capabilities and found that particularly appealing for this research. Also, we could track participants by ID. This made it really easy to conduct a longitudinal study, especially compared to a platform like MTurk where this is very difficult to achieve.
The first part of the study used a daily diary protocol. We had participants rate their mood, substance use, sleep levels, and social engagement on a daily basis for five days.
At the first assessment, participants gave some baseline information about their personalities. This included their functioning mental health in general, as well as how much they tend to use alcohol and cannabis. For each subsequent day, we tracked alcohol use, cannabis use, mood, and social interactions. This enabled us to look at how some of those initial baseline variables might be related to, say, daily mood and so on after that.
In aggregate, about 82% of all daily surveys were completed. This is comparable, or even slightly better than you might find if you conducted this study using a different format or platform.
However, one caveat to this: five days is definitely on the shorter side for a diary study. They typically run for longer periods. With a longer duration, it’s possible that retention would have reduced, but this remains to be seen.
Also, this study is very large, comprising 321 participants. It would take a lot of time and effort to do something of this size in person. So, we also need to weigh that up against retention.
Study 1 was descriptive in the sense that we just wanted to see if participants' mood levels, substance use, and patterns of social interactions would conform to what we’d expect theoretically.
For example, most people drink more alcohol on the weekends. They also tend to report higher levels of positive mood during this time. That’s exactly what we found in the Prolific sample. Drinking was more common on weekend days as were positive mood levels. This gives us confidence that we’re getting data quality that’s as high as what we might see with in-person recruitment.
That might seem a fairly basic and simple finding. But a lot of researchers want to see that you do, in fact, find those types of patterns when you use online recruitment. This directly validates online methods for this type of research. And it demonstrates to researchers the level of retention that they can expect when running their own longitudinal online studies.
In study two we recruited a separate sample and administered a personality inventory at two time-points - the first study day and then a follow-up assessment two weeks later. So this used a less frequent sampling approach, but spread it out over a slightly longer timeframe.
The aim of this second study was to demonstrate levels of retention over a longer time period, as well as the reliability of the sample. If we asked somebody about their personality at the first assessment, and again two weeks later, we should theoretically get pretty consistent scores if the sample is reliable. Your personality shouldn't change drastically over two weeks.
We also compared the Prolific sample to an undergraduate sample recruited in person. Undergraduate samples are one of the most common forms of convenience samples for researchers. So, we wanted to see how representative the Prolific sample was compared to a standard university population.
Regarding the consistency of responses across time, the Prolific sample was stable across the two time points. Participants self-reported personality scores didn’t change significantly between sessions. So researchers can be reassured that they're picking up reliable information about people over time.
Also, we were able to look at consistency over time not just for the whole personality inventory, but also at the level of individual items in the questionnaire. Even people's responses to individual items were relatively consistent over time. That's really reassuring to see.
This suggests that one type of data quality check that researchers might look at is that aspect of item-level consistency over time. If a participant says “I'm really extroverted” at the start of the survey, but then later on says “I'm not at all extroverted”, that should be cause for concern. It might indicate either careless responding or somebody rushing through a survey. That wasn’t the case at all with this study. The responses were consistent to a degree that you would expect, which is encouraging.
The undergraduates only completed a single assessment, but that gave us a baseline to compare the Prolific sample to. We had given each sample an extraversion measure and we looked at how consistent participants' responses were within that measure during the session. What we found was that the Prolific sample was as consistent, or even a little more consistent, than the undergraduate sample.
This aligns with my experiences. Having collected a lot of data from undergraduate samples, I typically have to remove about 10-15% of cases because those responses are invalid. However, on Prolific we only had to remove a very small percentage of cases. This suggests that Prolific participants may have been more engaged with the study on average.
Firstly, as far as applicability to substance use and mental health research, the finding that patterns of mood, social engagement, and substance use in the Prolific sample conformed to theoretical expectations. That point will be most applicable to different researchers and in different areas.
Second, the retention aspect. Across both of our studies we saw high retention, comparable to, if not better than, other studies using this approach. On average we saw approximately 90% retention across time for the second study. This was really incredible, especially over a two week period.
And finally, the fact that the screening filters could successfully recruit participants with mental health issues. This provides a huge bonus to researchers in this field who need larger samples for their work.
As I mentioned before, the studies we ran here were fairly short. Many diary studies typically last for multiple weeks with daily submissions. I’d like to run a study that took a similar approach but with that extended duration. This would really test whether online samples can provide high data quality and high retention levels across that timescale.
Collaborators and I on this paper have also talked about administering multiple surveys within the same day to see if we can get a lot of interesting information at the daily level. For example, we could look at fluctuations in mood even within the same day. I could see that being really appealing to a lot of researchers.
We’d also like to conduct interviews with participants as well as giving them questionnaires. One critique I often hear of online data collection in clinical research is that researchers can’t directly (face-to-face) interact with the participants. I think it would be really interesting to look at response consistency in interviews versus questionnaires. Asking people questions in a ‘live’ format might result in slightly different information than if they’re just writing their responses in a questionnaire.
You can find out more about this research in the full paper.
Today Prolific is turning 5 years old – Happy Birthday to us! 🥳 It's been a remarkable journey so far. 3000+ researchers from science and industry have used Prolific last year, we have 45,000 quarterly active participants, and we've seen 200% year-on-year growth. But we're only getting started. In this post, I'll tell you a little bit about our journey, give credit where it's due!, and tell you about our exciting plans for the future.
Fresh out of YC's Summer 2019 batch, we want to share some of our most interesting learnings. If you're a startup founder or enthusiast and want to learn about product-market fit, growth experimentation and culture setting, you're in the right place!