ArchivesSpace API Best Practices

Purpose of This Guide

Introduce YUL ArchivesSpace users to best practices and guidelines surrounding usage of the ArchivesSpace API. Provide tips for safely performing operations against the API.

Target audience

YUL ArchivesSpace users with YUL-focused API training.

Resources for Getting Started

  • Github repository for API training: https://github.com/yalemssa/api_environment_setup

  • Introduction to Metadata Power Tools for the Curious Beginner (by Maureen Callahan, SAA 2015): https://docs.google.com/presentation/d/1Pqs5_J6C9y6-Nw-QJ0rCrnkugUEianYdqCNNHhXYpJc/edit#slide=id.gc63ed3508_0_0

Be Aware of the Risks of Using the API

  • There is no undo button

  • There is no edit history

  • It is easy to overwrite and delete data

  • Can unknowingly make many mistakes very quickly

What our systems and policies do to identify and prevent problems

  • MySQL database is read-only (though it is not possible to limit access by repository)

  • API access is scoped to individual user permissions - users can only do things via the API that they can do in the staff interface

  • The JSONModel schema contains numerous constraints on data entry which are enforced by the API. These can be extended or modified via a plugin (e.g. yale data rules

  • The API performs a number of additional data validations before creating or updating records. These can be extended or modified via a plugin.

    • For example, API updates will fail if a user tries to add an enumeration value which is not already in the database (this is NOT true when importing EAD files via the Background Jobs interface - improperly formed enumeration values are added to the database)

  • The lock version function prevents users from updating the a record simultaneously

  • 3 YUL ArchivesSpace instances - PROD, TEST, DEV - with business rules for each

  • Periodic data auditing and reporting

What users must do to prevent problems

  • Write effective code

    • Think like a machine - they only do exactly what you tell them to

  • Understand the ArchivesSpace data model

  • Understand where the data you want to update is situated within that model

  • Test all updates in DEV or TEST (preferred) before running in production (REQUIRED)

  • Build in ways to make sure that you are actually running against DEV or TEST - print out the URL in your code, ask the user to confirm, etc.

  • Review results, preferably with reports rather than just eyeballing

  • Peer review. Have a colleague look at your:

    • Code

    • Input data

    • Results

  • Document all actions taken against ArchivesSpace records. Add comments to code, write detailed README files.

  • Keep all:

    • JSON backups

    • Input data

    • Scripts

  • Organize scripts and data by project/task