chit_archives_scripts
Repo to store CHIT-related work on archival and manuscript metadata reconciliation
Requirements
utilitiespackage for file processing and http utilities
Survey
Metadata schemas and controlled vocabularies for agents and subjects currently in use by YUL special collections and BRBL
Metadata schemas
ArchivesSpace metadata schemas for agents and subjects:
[subject]
[term]
ArchivesSpace documentation
XML-based schemas:
Example of JSON agent and subject records from ArchivesSpace
resource record with linked agents
Examples of collection-level EAD files with linked agents (exported from ArchivesSpace):
Controlled vocabularies
Agents
Personal names, corporate names: Library of Congress Name Authority File (LCNAF)
Family names: Library of Congress Subject Headings (LCSH)
Subjects
Overview of previous reconciliation work performed on archival metadata
Deliverables:
Work Plan
Spring 2020 student job description
Preliminary task list:
Reconcile agents and subjects with existing LCNAF and AAT URIs against Wikidata, SNAC, ULAN, VIAF others?
Reconcile agents and subjects without existing LCNAF and AAT URIs against Wikidata (pulling LCNAF and other URIs from Wikidata where possible) - includes agents created since last reconciliation and potentially another round on agents which were not found in previous round (using OpenRefine, possibly)
Edit and upload duplicate detection script to GH repo
Create reports for use in reconciliation (i.e. will need to have links to finding aids side-by-side with agent records for manual review)
Create instructions for authority reconciliation in OpenRefine
Import of authority records into ArchiveSpace following previous processes
Review of results
Data audit of agent and subject records in ArchivesSpace (partially Python, partially OpenRefine)
Cleanup of agent and subject records in ArchivesSpace
De-duplication - should this happen before or after reconciliation. Or both?
Planning for storage of additional URIs - no way in ArchivesSpace to store more than one
Compare and remediate differences between ArchivesSpace and Voyager agent and subject records
Reports from ArchivesSpace
Agents - families, all fields
All agent links
All subjects, all fields
All subject links
All agents with URIs
All agents without URIs
All agents created since July 21, 2018 with URIs
All agents created since July 21, 2018 without URIs
All subjects created since July 21, 2018 with URIs
All subjects created ince July 21, 2018 without URIs
Counts?
Data Sources
ArchivesSpace