Digital Preservation General news Open research

(Digital) Preservation first

If I was going there, I wouldn’t start from here…

A little background (peppered with superlatives and speculation for good measure) …

Ever since proto-humans first put marks on a cave wall way back when, producers and creators have been obsessed with preserving their legacy.  Sometimes they got it wrong (Oh Leonardo. Gesso, pitch, and mastic – what were you thinking?). But always there’s been a trend towards preservation formats. As new technologies were developed, they were embraced by future obsessed makers (“Let’s use acrylics as they’re stable for longer. Let’s ditch the palimpsest and use acid free paper instead”).

So why is it that the same is not true of Digital “stuff”? Yes, I know it’s not been around as long and I know that the definition of “creator” has widened to include many more roles and careers than ever before, but, to date, preservation always seems to be an addendum. Why?

Lets dig into, that a bit.

Digital “stuff” has been around for the best part of a lifetime and in that lifetime preservation—keeping it usable—has often been an afterthought (or even a “not thought about at all”). Is it because hardware and formats have changed so rapidly? Is it because being first to market meant that there was no time to consider longevity—cutting out the opposition was more important than getting it right? (“It’s good enough. Put it out there. We can always fix it later”).

Or perhaps it was a conscious decision. “Make the file formats so proprietary, so obscure, so unique that the users will have to stay inside our ecosystem for ever”. Too bad if that ecosystem just disappears one day.

Users have in the past been a significant part of the problem as well. Defending your files, your data, your intellectual property has, until very recently, been an obsession of many creators (especially in ecosystems where being the first to publish leads to career advancement). There has also been a significant pressure to get onto the next thing. As soon as you’ve published it’s someone else’s problem. Start the next project before someone takes your idea and runs with it.

And of course, many just didn’t even consider the need for preservation. This born digital lark is new. We don’t need to think about what happens next. New stuff should be fine.

To be fair, early adopters also got the short end of the stick when it came to output choices. In the early days of digital where specialist applications and programmes were few and far between, leading edge creators had to take what they were given. More often than not, what they were given was what best suited the developer, not the data creator. (“Want to work with doodleflips? There’s only one program available and it stores information in an incomprehensible, proprietary, undocumented file format. Enjoy.”).

In reality, the preservation problem is combination of all of those things. As a result, preservation is something that all too often happens after an output has been created. The preservation format’s not the same as the original output. It’s something that had to be reprocessed from the original.

And that’s inherently inefficient. It’s like taking a classically designed ship of the line designed for wind power and converting it to steam. It sort of works, but it doesn’t end up being a particularly good warship. As my grandmother would say, “If I was going there, I wouldn’t start from here”. Reprocessed preservation outputs sort of work, but not always particularly well or with particularly accessible results (for instance, a picture of a spreadsheet—a tiff—is still the format of choice in certain very niche sets of circumstances[1]).

Well, that was then. This is now. This is the age of open, of sharing, of replication, of collaboration. Formats are open (or at least they’re getting that way). Preservation is at least thought about, if not actively pursued.

So, time for a thought experiment.

Imagine what it would be like if a Preservation format was the primary output from data generating programs? What if the formats were all open (in every sense of the word—open licences, community managed and owned, open standards). Imagine, also, if the infrastructure to connect outputs and integrate services just worked—seamlessly, invisibly? What if digital preservation was built into the very DNA of all data creation and manipulation services? Automatic? Accepted as the norm? What if meta-data creation just happened without intervention? What if people knew about digital preservation, accepted it as the norm, expected it?

Farfetched you say? Maybe. But it wouldn’t be the first time a change of perspective, a change of philosophy transformed a digital process.

“Mobile first” in web design for example. In the early two thousands, as smart phones with web browsers came on the scene, few thought in any great detail about the practicalities of presenting 1024×768 pixels on a tiny 2”x3” screen. Sites were slow and users scrolled endlessly left and right, up and down. A very poor user experience.

Then responsive web design emerged. Bits of content started shifting around on the page as if they had minds of their own. Sites degraded gracefully to what were supposed to be the core elements. But still most user experiences on the small screen were unsatisfactory. Distilling the core elements from an all singing, all dancing, fully featured site isn’t that easy.

It wasn’t until “Mobile First” appeared on the scene that the real breakthrough occurred. Sites were designed to be good on mobiles first. Core content was prioritised ruthlessly resulting in sites that were more content focused and, as a result, more user focused. Only after the mobile version was created did the upscaling process take place to larger formats. This content/user focus resulted in faster, more appealing, and above all more usable web sites at all scales.

The same focus shift could (and I would argue should) take place for digital preservation. Output as a preservation format and then work up from there.

Would it be difficult to achieve this digital preservation paradise? The short answer is probably (the slightly longer answer is I don’t really know). I think it’s worth investigating though.

I’m not saying that this would be the answer to all the digital preservation world’s problems…

But right now I’m struggling to think of any downsides.

[1] The Significant Properties of Spreadsheets (OPF AIG Final Report) | Zenodo (page 10)

Share and Enjoy !


By Paul Stokes

Paul has had a varied career in both the commercial sector and academia (and all points in-between). At present he leads on preservation for Jisc's Preservation service (and is currently referred to as a "Product Manager"). He is a director of the Digital Preservation Coalition (DPC) and a director of the Open Preservation Foundation (OPF). He's been passionate about repositories and preservation for many decades and currently also has a number of bees in his bonnet regarding costs, value, sustainability, and storage.

Leave a Reply

Your email address will not be published. Required fields are marked *