FAIR: Findable, Accessible, Interoperable, Reusable.
We all think FAIR is a “good thing” don’t we (given the likely audience for this blog that is [almost] a given). Who could be against something that, once stated, is so blindingly obvious? If it’s not findable, and usable then why are we spending time, money and resources keeping it?
And it’s a great acronym as well. FAIR. It “does what it says on the tin” (UK centric reference—apologies to international readers). I always think that if you’ve got a good acronym, a relevant rhyme or a great bit of alliteration you’re halfway there when it comes to hearts and minds….
BUT
(you knew there was a BUT didn’t you… Here comes the heresy…)
Just because something has a simple, snappy acronym doesn’t make it right!
There are (in my personal opinion) significant fundamental flaws in the whole concept of FAIR—flaws, that unless addressed, will ultimately doom the whole concept to failure. Have those concepts been sacrificed for the sake of the aforementioned simple, snappy acronym? (Hmmmm. Quite possibly. Never let an incomplete concept get in the way of a good acronym).
Sustainability
There is a cost involved in making data FAIR. Quite apart from the cost associated with the creation of the data in the first place, money is required to deposit data, plug it into discovery systems, keep it usable and so on. And it’s not a simple one-off cost.
The more data you add, the more money is needed to keep the discs spinning, keep the systems up to date, and for on-going curation of the data. It is inevitable that costs will continue to rise and we will need to do more with less resources. I know that the cost of storage is coming down, but cheaper storage leads to greater use and, ultimately greater spend (Jevon’s Paradox) (and it’s also worth noting that the rate of reduction of storage costs is starting to level off…).
In purely fiscal terms, to make a business case for keeping it, we need a perceived economic value for the data to balance the cost of keeping it. If there isn’t value to be realised in that data then it will never be sustainable to keep it for the long term.
Reliability
And is the data reliable? If no consideration is given to the provenance of the data, the trustworthiness of the data, the veracity of the offered “truth”; if there is no indication of those qualities; then nuggets of truth and trust will be increasingly swamped by the deluge of dross.
How about…?
Findable, Accessible, Interoperable, Reusable, Economically viable, Reliable.
Nope. Not quite there. “Close, but no cigar[1].” There’s more to consider!
Carbon cost
When I talk about the economics of data and curation I’m not just talking about monetary cost (which, as we know, is not insignificant in and of itself). There’s also the power costs to consider. The carbon cost.
Do you know how much power is consumed by data centres world wide? A brief search on the web shows up many, many different discussions and, many, many different figures. But one thing they all have in common is that they’re huge!
“…3% of the worlds electricity…”[2],
“…40% more than the total UK’s electricity…”[3],
“…Amazons carbon footprint amounts to that of a small country…”[4],
“…Ireland is one of the EU’s worst carbon emission offenders…[due to data centre consumption]”.[5]
416 Tera Watt hours
416 Tera Watt hours per year is one figure that’s been widely quoted in recent years. An on-line calculator in the .gov domain (It’s “gov”. It must be legit. Right?)[6] says that 416 Tera Watt hours is the equivalent of just under 295 million metric tonnes (294,128,640 tonnes) of carbon dioxide.
I don’t expect you to fully comprehend that figure—the numbers are mind bogglingly large [that much energy is enough to boil 10,400,000,000,000,000 kettles (1.04 x 1016 kettles)]. Suffice to say, it’s bad and going to get worse.
Researchers like ‘Impact’ don’t they? Its good thing, right?
Weeell…..
Not if that impact is in the form of tonnes of CO2, or reputation damage because you didn’t consider the environment when you were squirreling away those petabytes of data.
A new acronym
So, taking all that into account, I’d like to propose a new extension to the FAIR concept.
Findable, Accessible, Interoperable, Reusable, Environmentally friendly, Sustainable, Trustworthy
How are we going to ensure that we produce the FAIREST data? To be blunt I don’t know. So, over to you. You tell me. What can we do about it? Answers on a (recycled) postcard please.
(I know that I haven’t linked to the primary sources. But, ironically I wrote this on a plane and wi-fi is a bit dodgy at 30,000 feet. You can follow the links as well as I can. An exercise for the reader)
[1] Look it up – it’s a entertaining anecdote
[2] https://www.vxchnge.com/blog/cloud-data-center-power-consumption
[3] https://www.independent.co.uk/environment/global-warming-data-centres-to-consume-three-times-as-much-energy-in-next-decade-experts-warn-a6830086.html
[4] https://globalnews.ca/news/5927485/amazon-climate-change/
[5] https://www.irishtimes.com/business/technology/why-ireland-s-data-centre-boom-is-complicating-climate-efforts-1.4131768
[6] https://www.epa.gov/energy/greenhouse-gas-equivalencies-calculator