This post has been written by Dr Jacqueline Thomson, Senior Research Associate at the School of Psychological Science, University of Bristol.
Researchers from the University of Bristol, in partnership with Jisc, have completed a pilot project investigating how prediction markets might be used to help institutions make decisions about which outputs to submit to future REF exercises.
Assessing research quality is a difficult and time-consuming task, so our research team investigated whether prediction markets, an existing methodology from the field of economics, could be used to help with this task when preparing REF submissions (for a full explanation of our project, see this previous blog post). Prediction markets are a way of aggregating information from a group of people – sometimes described as the “wisdom of crowds” – by asking them to trade ‘bets’ on outcomes of future events. Our prediction markets asked participants to predict the REF ratings of individual research outputs.
We ran six prediction markets, spanning three universities and four academic fields (psychology, biology, chemistry, life and environmental sciences), predicting REF outcomes for a total of 170 research outputs. Participants in each market were academics from the relevant department and university, and each market had between 4 and 31 participants. For those curious about how the markets worked in practice, you can see our instructions to participants here, and consult the methods section of our preprinted study. Since we did not have access to the final REF scores of individual research outputs, we compared the market predictions to the scores from mock REF panels convened by the given departments.
In aggregate, we found that prediction markets were able to predict mock REF scores with over 70% accuracy, with a strong correlation between the two systems of ratings (r = 0.48, p<.001). We found no statistical evidence that the markets systematically over- or under-rated research outputs, although we did find that the markets with more participants and more trades tended to have more accurate results – that is, results more similar to the mock REF ratings.
For one market, we also compared the prediction market results to machine learning models that incorporated various metrics, such as citations, social media attention, authors and affiliations. The ratings from the machine learning model, the prediction market, and the mock REF were all highly (but not completely) correlated with one another, suggesting they are all tapping into an underlying construct of perceived research quality, but each using or privileging different sources of information.
Future use of prediction markets for the REF
What does our study mean for future uses of prediction markets for REF preparation? To answer this question, we also ran a follow-up workshop discussing the implications of our study with representatives both from the departments where we ran prediction markets, and from other institutions. (See the report here)
One major topic of discussion was the acceptability of prediction markets as a research assessment tool. Are they a fair method of deciding REF submissions? Would academics at participating departments have any objections to using this method? Some institutional representatives were concerned that the non-systematic approach of prediction markets could perpetuate existing biases in research assessment, or might put off academics who found the process difficult to learn. On the other hand, our feedback survey found that participants found the process engaging, and some especially liked the ‘gamified’ aspect of trading bets.
Given these concerns, workshop attendees agreed that one future use of prediction markets for the REF could be to introduce a broader pool of academics to the REF, beyond just a small mock REF panel of senior researchers. It could be an especially good way to engage early career researchers, who are often left out of the REF process.
Another major topic in our workshop was whether the prediction markets were more or less efficient and/or cost effective than current methods of REF preparation (typically, small panel assessments). Our research found that prediction markets were a good, although not perfect, predictor of mock REF scores (of course, these mock scores themselves are not perfect predictors of actual REF scores). On the other hand, feedback surveys revealed that the typical participant spent on average 1 hour 50 minutes on the prediction market (on average each market had 13 participants and assessed 28 research outputs, meaning it took 50 minutes of staff time per paper). Some institutional representatives suggested this was not an efficient use of time compared to existing methods, especially as some departments have to assess hundreds of papers for the REF.
Given the findings from the workshop and our study, we suggest that prediction markets would probably best be used to augment, rather than replace, current methods of assessment for deciding REF submissions. For example, a three-step approach could be used to select outputs for submission. First, use machine learning to sift through the whole sample of eligible outputs, as it is easy and cheap to implement, keeping those outputs that score very high, and discarding those that score very low. Then, for the outputs in the middle range, run a prediction market to get more information. Finally, compare the machine learning and prediction market ratings. For research outputs where these two ratings differ considerably, send them to a panel for close reading, to arbitrate. This could in principle reduce the total amount of academic time spent on the process.
We investigated whether prediction markets could help in REF preparation by predicting REF scores. We found they had a good, but not perfect, correlation with mock REF scores and machine learning models, but they may have use in helping gain more information for unsure cases, or to train younger academics about the REF and research assessment.