While at the Leading Edge Libraries Conference on September 20-21, 2018, I had the honor of participating in a panel presentation with AJ Johnson, librarian and data management guru at Walt Disney Parks & Resorts. Here is a summary of my portion of the panel, describing how I updated the records management program at Lancaster Theological Seminary to include electronic records.
Fortunately, I started with a fairly complete and comprehensive records management manual prepared by my predecessor. This records management manual included retention schedules for each department of the school, but it assumed that all records were in print format. Electronic records were not being collected or tracked, and backups were not being archived. Every department had designated directories on a shared fileserver and was responsible for naming, organizing, and keeping electronic files on that file server. Backups were being made of the fileserver, but these electronic files were not being harvested and archived as their print predecessors.
In order to update the records management program to include a provision for electronic records, we had three requirements:
- Fully updated policies and procedures that included archiving electronic records
- Secure storage and retrieval solution for confidential electronic files
- Minimal cost and budget impact
The first step in our process was to determine the scope of electronic records that were being generated and needed to be archived. In order to do this, I met with the records liaison for each department at the school. Together we reviewed the retention schedule for the department, specifically identifying what records are generated in print, what records are generated electronically, what records need to be kept confidential, and how long records are to be kept in the department before being transferred to the archive. I discovered that the types of records being generated electronically included text documents, data/spreadsheet reports, and images. The updated retention schedules included a new column that identified format: print, electronic, or both.
Once I knew what kinds of electronic records we would be archiving, I moved into the second step and began researching records management platforms. My research primarily focused on open source solutions because of my past experience with open source platforms and our need for a low-cost solution with minimal budgetary impact. We were already using Omeka for our digital archive/institutional repository, but found that it would not fit our needs for the records management program. Omeka does not support custom user groups or allow limited access to select content by user group. Islandora is another open source archival platform that had similar limitations. Alfresco Community Edition is a platform very similar to SharePoint, but this had more features than we needed for an archival solution. DSpace is another institutional repository platform that is capable of setting up user groups with different levels of permission and supports embargoed access to certain files. However, I was uncertain about ease of use for our records liaisons and it was uncertain whether we would be able to successfully deploy DSpace and link it to an Amazon S3 bucket. iRODS is an extremely robust platform that could easily link to an Amazon S3 bucket, but the system is much more complex than a small school would need for relatively simple archival storage and retrieval.
I ended up choosing ResourceSpace, an open source digital asset management platform developed by Montala Ltd. It has a visually pleasing interface and is highly flexible and customizable for defining user groups and permissions related to specific collections and files. Once set up for our departments, it would offer secure access to confidential files only to staff members in approved access groups. The metadata fields are also completely customizable and flexible so I could edit them to meet our needs. A bonus feature I didn’t expect was a built-in tool that harvests and indexes the full text of documents, which makes the search engine within the platform much more useful and effective.
Once I selected ResourceSpace, the third step in our process was to set it up. I provisioned an Amazon EC2 server with an Ubuntu LAMP stack. Initially, I tried installing the Bitnami ResourceSpace stack onto an Amazon EC2 server, but this caused problems when it was time to update ResourceSpace. So that we would not be needing to continually increase the volume size on our server, I used an application called S3FS to mount an Amazon S3 bucket as a directory on the server. This allows ResourceSpace to read and write to an Amazon S3 bucket as it would an ordinary directory on the fileserver. The server’s web traffic is secure and encrypted through the use of Let’s Encrypt and the Certbot utility which generates SSL certificates. Additional set up involved configuring the outgoing email and defining resource types, metadata, user groups, permissions, and user interface settings.
The fourth step in our process is implementation of ResourceSpace. This is what we’re in the midst of at present. I’ve revised the records management manual to include new responsibilities for records liaisons and procedures related to archiving and transferring electronic and born-digital records. Staff in each department need to be trained how to upload files and create metadata for them. In the process of doing these trainings, I expect that we will continue making tweaks to the system and the workflow so that everything will work smoothly and seamlessly.
Going forward, I am aware that this solution falls short of long-term electronic preservation ideals. While I have a process in place to make regular backups of the MySQL database and snapshots of the server’s volume, we are not following any practices to ensure that our electronic records do not degrade over time. I’ll be looking for cost-effective ways to do this in the future, and relying on Amazon’s redundancy checks to stave off data corruption and degradation in the meantime. We may also eventually begin to digitize some of our print records to free up physical space in our archives. These efforts would involve making archival-quality scans and processing them with OCR software.