Metrics: a quality measure? By Simon Hazelwood-Smith

How do you measure research quality? In the UK, this question is part of a task given to the Research Excellence Framework (REF), a 5-year assessment by expert review of the relative quality of all academic research in the UK.

The REF is a very large undertaking; almost 200,000 separate research outputs and just over 50,000 staff were submitted for assessment with the cost for the whole process at around £60 million. However with the results being used to allocate around £2 billion of funding, the reasoning behind being so thorough becomes very clear.

The methodology of the REF is firmly rooted in the concept of peer-review. Representative expert panels judge entries in each of the 36 different academic fields. Each panel has the responsibility of grading submissions on research output, research impact, and research environment.

However this does not mean that the methods of the REF are necessarily set in stone. The rise in the number and visibility of metrics, alongside the addition of ‘research impact’ as a new area in which entries are judged, has meant that it is being widely suggested that metrics may have a role to play in future iterations of research assessment.

With this in mind an independent investigation by the Higher Education Funding Council for England (HEFCE) has been commissioned to assess the possibility for metrics to be included within future REF exercises. This is in view of them potentially increasing efficiency and objectivity as well as lowering costs (if you believe the hype).

Broadly speaking metrics refers to attempts to capture quantitative information about the world. In terms of science and scientists metrics are focused on measuring the impact and quality of research, as well as the institution that it is undertaken in.

In academic research current metrics tend to rely heavily on the assumption that the number of citations of a publication or paper is correlated with the quality and impact of that research. This idea is not new; impact factors have been in use since the 60’s and have become widely accepted as a proxy for the relative importance of scientific journals.

This use as a proxy is not uncontested. Indeed the suggestion that metrics should be used at all in research assessment let alone the idea that they may improve objectivity provokes heated debate in the research community.

There are several reasons to suggest that citations are not as informative as we may like. As pointed out by Prof David Colquhoun of UCL amongst others, citations may be high because a paper is incorrect, can take a long time to accumulate for some influential work, are field specific, and are often lower for ‘difficult (i.e. maths heavy) work than other types of publication (eg. Reviews).

Different metrics attempt to correct for inherent biases by augmenting scores with additional information. For example the ‘H-index’ tries to give a measure of consistency by taking into the number of papers that a given scientist has published with a certain citation count. However this is likely to introduce other kinds of biases to particular groups, for example early career researchers who have not yet had time to publish many papers will score poorly.

Particularly relevant with the introduction of impact outside of academia as a measure of quality is the concept of alt-metrics. Alt-metrics attempt to use alternative measures, such as Facebook shares, blog posts or tweets to gauge the external reach of research. Unfortunately it is not difficult to think of several ways in which these metrics may bias . Not least because not all research is sexy, for example any work claiming to find the missing link in human evolution will be far more likely to be shared than fundamental work in determining protein structures. This does not necessarily tell us much about the relative quality of each piece of research.

The practice of ‘gaming’ in order to improve metrics ratings is a popular argument against their use. In essence this follows the principle that when specific means of assessment are introduced into a system, those within that system (scientists) may alter their behaviour in order to better their rating (metric score), rather than to better the fundamental substance of their outputs (their research). To my mind this is an argument against using any one specific measurement of quality, howevere use of an array of measures should be resistant to this kind of issue.

The outlook, is not all bleak for metrics though, (even alt-metrics!) but we must be careful, honest, and intelligent when using them. The first key point to make, as made by Stephen Curry at SPRU’s conference on metrics, is to refer to them as indicators. If metrics are considered as indicators that show some particular feature of the research rather than an absolute surrogate of quality then we are far less likely to unconsciously introduce bias into a system.

Metrics can also tell us some fundamental features of the sort of research that is being done. For example, metrics measuring key words in research papers can highlight particular themes that are popular. For example these sorts of measures may tell you that the majority of studies into rice are about genetics, rather than other areas such as yield. This may be especially useful for governments and regulatory bodies in ensuring the direction in which science is being done matches the goals that they have.

In my opinion it is vitally important that anyone using indicators knows exactly the mechanism by which each one is calculated, and also the groups that will likely be routinely biased against. Though it is a well-worn cliché more research really does need to be done into how the use of metrics and alt-metrics may alter choices. For example would introducing metrics into a system such as the REF produce markedly different quality ratings than expert review?

Ultimately the question of how to allocate scientific funding is likely to always be a contentious issue. Scientists are, understandably, often opposed to outside actors potentially reducing their autonomy and choices in the research they do. Scientists are also very wary of any changes that are perceived to compromise the integrity of science; as can happen when actors engage in gaming because of metrics. Nevertheless, I am of the opinion that, with a careful hand and particular attention to caveats, metrics may well have a role to play in increasing the objectivity of research assessment.



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s