Categories
Digital Preservation General news Open research The research estate

Digital Preservation to the Core

At Jisc we recently set-up a digital preservation dynamic purchasing system (A DP DPS). Part of the process involved the creation of a “a core set of requirements…” to evaluate digital preservation systems against. In the process we came up against a number of profound questions (including “what is a digital preservation system”… …but that’s a question for another day [and possibly another blog]). Today I’m going to explore what our core requirements are and how we arrived at them.

So, “a core set of requirements…”. How are they different from a “normal” set of requirements?

Before delving too deeply into that lets first think about what a Digital Preservation system actually is.

Here’s a simplistic definition.

It’s a system that keeps your digital “stuff” usable
(for the medium to long term)

Now there are many ways of doing that and many tools that help you along the way. Some tools have more bells and whistles than others. Not all users will need all the functionality of the tools—some functions fall into the “nice to have” and “makes life easier in my specific case” zones as opposed to “stuff can’t be preserved without this bit of functionality”.

So, unlike when procuring a Digital Preservation system to solve a particular institutions requirement, we needed to come up with a generic Minimum Viable Product (an MVP) that enables potential system buyers to be confident that, whatever solution they wind up procuring, we have ensured that the suppliers on the DPS are capable of keeping their digital “stuff” safe and usable.

Using the MVP as the basis for the core requirements means that the solutions offered through the DPS are sufficiently open and variable enough to allow for an element of competition when running mini competitions. Buyers may also opt to supplement these core requirements with additional requirements—the nice to have bells and whistles that suit their use case—for their mini competitions.

Onto the process…

There has been a lot of work undertaken by others in this area, to say nothing of the published features of the commercial products. As we were quite keen not to reinvent the wheel, we decided to use this exiting corpus of knowledge as our starting point, applying our MVP philosophy to winnow the multitude of requirements down and ensure that we ended up with a core set of “must have” requirements.

We started with Requirements gathering—gathering requirements from multiple sources including Jisc’s Research Data Shared Service (RDSS), Jisc’s Open Research Hub (ORH), The Digital Preservation Coalition’s Procurement Toolkit (which provided a particularly useful format for grouping functionality and requirements), and requirements from recently published public tendering exercises. We put all of these into our (enormous) pool of possible requirements.

Having gathered everything together, we then undertook a concatenation and disaggregation exercise—we removed duplicate requirements, disaggregated multi component requirements and combined similar requirements, ending up with a slightly less enormous pool of requirements.

We used the resulting reduced pool requirements as the starting point for the next phase, refining and consultation.
We initiated a series of focus groups and one-to-one meetings with stakeholders representing potential users, buyers, vendors, and subject matter experts. One beneficial side effect of having this consultation process in the final stages of the preparing the DPS was that it ensured that our core requirements set incorporated the latest thinking of what had once been considered to be “pushing the envelope”, but which the community now considered to be mainstream. It was interesting to see how much has changed in the (relatively) short time span between the oldest and newest requirements.

The focus groups were mainly made up of potential buyers of DP systems (and hence potential users of the DP DPS) and subject matter experts, and concentrated on examining, adding to, and prioritising the requirements.

The one-to-one discussions were mainly with potential vendors.

In both cases the starting point was “You can’t have everything”. Participants were asked to consider who their stakeholders were (and rank their importance), and what the problem was they wanted to solve with a preservation solution—“where is your pain”—as opposed to being seduced by what a shiny solution could do. They were also asked to prioritise—to consider what were the most important features/requirements of a solution—and to decide what could be discarded. The consultations were an iterative process, each loop through the requirements producing a slightly more refined set.

The final sifted set of requirements was then followed by a MoSCoW exercise—dividing the remaining requirements into Must have, Should have, Could have, Won’t have. The “Must haves” have become the core requirements for our DPS. The “Should haves” and “Could have’s” weren’t discarded though. They have become part of the advice and guidance in the Buyers Guide for Buyers to consider when creating their requirements for their mini competitions (see Appendix 2 in the Buyers Guide).

So, this is the resulting set of core requirements.

Top Level requirements

  1. The Supplier must confirm the service supports the preservation of digital objects with differing file formats and sizes.
  2. The Supplier must confirm the service can be supplied as a stand-alone product and does not require purchase of any other system.
  3. The Supplier must confirm that, where applicable, the service meets and complies with relevant accessibility standards. These include, but are not limited to, relevant accessibility legislation where applicable [i.e. Public Sector Bodies (Websites and Mobile Applications) (No. 2) Accessibility Regulations 2018. e.g. by being compliant with WCAG2.1 at AA standard]. If you don’t meet and comply with the accessibility standards named above, you must indicate the standards you do comply with and/or the steps you are undertaking to meet those standards.

Integrity and Authenticity

The system must precisely record and manage the integrity and authenticity of the digital content and metadata it holds. It must use an appropriate mechanism to verify that digital content has not accidentally or maliciously been changed over time and utilise metadata to record any actions enacted on the content.

  1. The system must record checksums for every file ingested.
  2. The system must be able to validate checksums against those supplied with content if they are supplied.
  3. The system must generate and store checksums (or employ a similar integrity checking mechanism) for content that has been supplied without checksums.
  4. The system must generate and store checksums (or employ a similar integrity checking mechanism) for content that is created within the system (for instance files within Archival Information Packages [AIPs]).
  5. The system must support periodic integrity checking, reporting any damaged or missing files.
  6. The system must be able to generate an audit log and record event metadata describing all actions enacted on digital content.

Data Model

The system should have a comprehensive data model that enables the complex structure of digital objects to be captured on ingest and accurately represented over time as they are managed and preserved.

  1. The data model must be able to capture digital objects that are composed of multiple hierarchical components, such as: files, drafts, published versions, or copies subsequently created for preservation or access.
  2. Digital objects must be assigned unique identifiers.

Exit Strategy

The system must provide a clear exit strategy to other systems, without vendor lock in. This must ensure that the inevitable migration to a future digital preservation system is possible. It should also minimise the effort and risk involved in such a migration.

  1. The system must have the ability to export or copy digital content and all associated metadata in a manageable format/structure for ingest into another system.

Authenticity

The system must enable authentic digital content and metadata to be ingested.

  1. The system must enable the ingest of digital content and associated metadata.
  2. The system must be able to retain the original bitstream.
  3. The system must be able to provide reports on the success or failure of ingest activities.

File Characteristics

The system must enable authentic digital content and metadata to be ingested.

  1. The system must identify known file formats and reference appropriate registries of further information such as PRONOM and/or Wikidata.

Replication and Storage Management

The system must support replication and storage management. The system should have the ability to store multiple copies of ingested digital content on different storage systems in different geographical locations.

  1. The system must have the ability to store multiple copies of ingested digital content (potentially on different storage systems and potentially in different geographical locations).

Preservation Actions

The system should enable preservation actions that fulfil preservation plans designed to mitigate identified preservation risks.

  1. The system must enable the migration of files from one file format to another.
  2. The system must record preservation actions (and their outcomes) in the associated metadata.

Management of Digital Content and Metadata

The system must support the management of digital content and metadata over time.

  1. The system must provide controls to minimise the risk of accidental or malicious deletion or content change.
  2. The system must be able to ingest and manage all required metadata including metadata appropriate for specific content types (e.g. geospatial, audio visual).
  3. The system must enable the managed disposal of content.
  4. The system must restrict any actions to manage digital content based on user login and/or user roles.

Discovery

The system must enable the controlled discovery of, and access to, digital content and metadata.

  1. The system must ensure that all digital content and any associated metadata is only discoverable by authorised users.
  2. The system must ensure that all digital content and any associated metadata is only accessible by authorised users.

Training and support

The system must provide appropriate training, support and help for users.

  1. The system supplier must provide training for system users.
  2. The system supplier must provide a support mechanism for system users.

Want to know more?

More information, including details of the vendors currently signed up to the DP DPS can be found at https://www.jisc.ac.uk/digital-preservation-systems-dynamic-purchasing-system-dps or by emailing preservation-dps@jisc.ac.uk.

Share and Enjoy !

Shares

By Paul Stokes

Paul has had a varied career in both the commercial sector and academia (and all points in-between). At present he leads on preservation for Jisc (and is currently referred to as a "Subject Matter Expert (Digital Preservation)"). He is a director of the Digital Preservation Coalition (DPC) and a director of the Open Preservation Foundation (OPF). He's been passionate about repositories and preservation for many decades and currently also has a number of bees in his bonnet regarding costs, carbon, value, sustainability, and storage.

Leave a Reply

Your email address will not be published. Required fields are marked *