For Open Access week, Jisc’s Paul Stokes (Product Manager, Open Research Services), argues that without investment in infrastructure, Open Access can’t be open for all.
So here we are.
Open access week.
People across the world are putting on their FAIR hats, pulling up their Plan S socks, pinning on their Open Access Week badges* and are celebrating Open Data for All.
Everything is open.
Everything is free.
Paywalls are consigned to the bin of history…
And it’s all a great big waste of time and money!
Not at all. Hear me out. This is something I’m quite passionate about.
Open access is useless…
…without the infrastructure, systems, policies and money in place to make it more than an empty gesture.
It’s all very well making digital “stuff”—and by “stuff” I mean anything you’re going to share from papers, articles, data, research outputs in general, etc.—freely available. But if that “stuff” is in an obscure, un-preservable format, with no metadata, no licence attached, not connected to any form of scholarly comms infrastructure, stored on a 30 year old server that’s served by a 56K modem then what’s the point? The only result is a slight feeling of smugness—We’re supporting Open Access! Policy tick!
You might as well just flush it.
If you’re going to release
your the data** into the open universe it’s essential that you do so in a responsible manner. That means, first and foremost, ensuring it’s in a format that’s both usable and preservable. There’s no point in providing a PDF of a scanned picture of a table of data. Absolutely ineffectual. Much better to provide a text file of Comma Separated Values (CSV) that will stand the (usability) test of time.
And don’t forget the meta data. Quite apart from the obvious stuff like what the “stuff” is, who created/generated it and how, there’s so much more that’s needed to make it usable. Headings for tabular data for instance. Descriptions, abstracts, versions, dates, links, licences (how are you permitted to use it) persistent identifiers, provenance, checksums (the trustworthiness of data is going to be the next big thing – trust me 😊).
Assuming you’ve covered all that, your “stuff” still isn’t yet truly usable. If people can’t find it, and once found, can’t download and use it then it’s still not completely open. Connection to a discovery infrastructure and effective bandwidth through the whole of the connection from provider to recipient is essential. As are considerations of size. You, the supplier of large data files, might be connected to a gigabit internet pipe at your end. But if a potential user is working on the end of a single figure megabit local infrastructure, then they won’t be able access the data no matter how “open” it might be. There are of course ways around this (smaller file formats, chunking data, providing compute local to the data and so on), but they need to be thought about as part of the process of opening up your data, not after the fact.
And then there’s the cost.
data “stuff” isn’t free.
The ultimate desire might be to make it free at the point of use, but someone, somewhere along the line is going to have to pay. Open infrastructure isn’t free infrastructure. Servers cost money to run. The generators of the “stuff” paid to create it in the first place, and in many cases they continue to pay to provide it to users. Where is that money going to come from? Simple project funding which only covers costs during the active lifetime of a project simply isn’t sustainable. Eventually the long tail cost of keeping “stuff” available will vastly outweigh the original cost of creating it in the first place.
How could this long tail cost be accounted for? Perhaps a national (international) centrally funded, interconnected, interoperable repository and preservation infrastructure is the way forward. That way we ALL pay. But, individually the cost is infinitesimal. (A debate for another day…)
Now, I’m a preservation person. It’s my job to bang on about Digital Preservation to anyone who’ll listen (and thank you for listening by the way). So obviously I’m going to big up the preservation aspects of the problem. But the truth is, it’s all important. Open access is but the tip of the iceberg. The quick, easy and cheap tip. We need all of the rest in place before we can truly say our “stuff” is open to all.
Before I finish… An exercise for the reader. What do you think is the order in which the missing components could/should be addressed? What is the most important (and why)? How should we go about addressing these problems, and (another one of my hobby horses) who’s going to pay for it?***
*I made up the FAIR hat and Plan S socks, but you can get your badge design here
**Who owns open data? The creator? The funder? No one? Everyone? Answers on a postcard please.
***Go big with the questions or go home I always say!