Kevin Roche John Dinkeloo Associates Data Re-use Process
Process
Save project index .XLSX file as a CSV file. Delete all rows which refer to boxes already in ArchivesSpace
Run
process_spreadsheet.pyscript against CSV fileCopy/past box numbers into Excel and run remove duplicates function
Add
repo_uriandbox_typefields to box number spreadsheetComment out relevant parts of
project_index_new.pyscript and run to create containers (FIX THIS)Run split titles through OpenRefine to extract dates, clean up
Keep original title
Extract dates (just split by comma)
Add index!
Run replace function to remove dates
Run find/replace in LibreOffice for punctuation, spaces, ‘no date’ etc.
Match box numbers and project numbers with URIs
Create new projects if needed, record URIs
Create archival objects with
project_index_new.pyscript
Notes
Do all of this in ArchivesSpace TEST first, then repeat for production.
Future Work
Combine projects from same location with different job titles
Separate projects that have multiple job numbers
Re-order based on date, job number, or box number