chit_archives_scripts

Repo to store CHIT-related work on archival and manuscript metadata reconciliation

Requirements

  • utilities package for file processing and http utilities

Survey

Metadata schemas and controlled vocabularies for agents and subjects currently in use by YUL special collections and BRBL

Metadata schemas

ArchivesSpace metadata schemas for agents and subjects:

ArchivesSpace documentation

XML-based schemas:

Example of JSON agent and subject records from ArchivesSpace

agent_person

agent_corporate_entity

agent_family

subject

resource record with linked agents

Examples of collection-level EAD files with linked agents (exported from ArchivesSpace):

EAD 2002

EAD 3

Controlled vocabularies

Agents

Subjects

Overview of previous reconciliation work performed on archival metadata

Deliverables:

Work Plan

Preliminary task list:

  • Reconcile agents and subjects with existing LCNAF and AAT URIs against Wikidata, SNAC, ULAN, VIAF others?

  • Reconcile agents and subjects without existing LCNAF and AAT URIs against Wikidata (pulling LCNAF and other URIs from Wikidata where possible) - includes agents created since last reconciliation and potentially another round on agents which were not found in previous round (using OpenRefine, possibly)

  • Edit and upload duplicate detection script to GH repo

  • Create reports for use in reconciliation (i.e. will need to have links to finding aids side-by-side with agent records for manual review)

  • Create instructions for authority reconciliation in OpenRefine

  • Import of authority records into ArchiveSpace following previous processes

  • Review of results

  • Data audit of agent and subject records in ArchivesSpace (partially Python, partially OpenRefine)

  • Cleanup of agent and subject records in ArchivesSpace

  • De-duplication - should this happen before or after reconciliation. Or both?

  • Planning for storage of additional URIs - no way in ArchivesSpace to store more than one

  • Compare and remediate differences between ArchivesSpace and Voyager agent and subject records

Reports from ArchivesSpace

  • Agents - people, all fields

  • Agents - corporate entities, all fields

  • Agents - families, all fields

  • All agent links

  • All subjects, all fields

  • All subject links

  • All agents with URIs

  • All agents without URIs

  • All agents created since July 21, 2018 with URIs

  • All agents created since July 21, 2018 without URIs

  • All subjects created since July 21, 2018 with URIs

  • All subjects created ince July 21, 2018 without URIs

  • Counts?

Data Sources

Resources to Consult