Archives and AI: An Overview of Current Debates and Future Perspectives

100%

Archives and AI: An Overview of Current Debates and Future Perspectives

The digital transformation is turning archives, both old and new, into data. As a consequence, automation in the form of artificial intelligence techniques is increasingly applied both to scale traditional recordkeeping activities, and to experiment with novel ways to capture, organise, and access records. We survey recent developments at the intersection of Artificial Intelligence and archival thinking and practice. Our overview of this growing body of literature is organised through the lenses of the Records Continuum model. We find four broad themes in the literature on archives and artificial intelligence: theoretical and professional considerations, the automation of recordkeeping processes, organising and accessing archives, and novel forms of digital archives. We conclude by underlining emerging trends and directions for future work, which include the application of recordkeeping principles to the very data and processes that power modern artificial intelligence and a more structural-yet critically aware-integration of artificial intelligence into archival systems and practice.

One INTRODUCTION

Long before big data as an idea had been invented, archives already measured their collections in kilometers of files and folders. Worldwide large-scale digitisation efforts have by now transformed at least some of these collections into digital data. Next to these, from the nineteen nineties onwards, governments and other institutions with archival interests have increasingly worked digitally. This change did not immediately lead to a transformation of archival practice and workflows. The archival process still remained defined largely by manual appraisal, selection, and review as long as the size of the collections and records still allowed this. However, the time window of this shift is closing fast. More and more archival collections are digitised and new born-digital records at ever larger scale are being submitted to archives. This makes a manual archival process less and less feasible. At the same time, records still need to be evaluated to ensure quality and trust, which provide the foundations of archives. Consequently, human archivists need the support of machine agents to assist them working through archival big data. The role of archivists is thus transformed, as they need to learn to make use of machine reasoning for appraisal and selection and to assess the assessments of machines. The archive becomes a big data organisation and like all big data organisations needs to at least partly put its trust into Artificial Intelligence, mostly in the form of machine learning, to deal with the transformation.

Across the world, archives acquire AI capacities to organise their workflows around the big data they have as well as to offer their big data to outside organisations. Ten years ago AI activities in archives were still largely experiments that showcased a potential, as they offered new ways of working with specific parts of the archival holdings like digitised newspaper collections. This has changed. We have recently observed a new trend where AI is used throughout the recordkeeping processes that characterise archives. In this article, we survey recent research at the intersection of AI and archival processes, using the lenses of the Records Continuum model.

To our knowledge, there is no such survey of the relationship between AI and archives. Romein twenty twenty provides a detailed overview for the related field of digital history, while Fiorucci twenty twenty contributes a closely related survey for cultural heritage in general. While neither survey considers the records' continuum that defines archives, they do observe that machine learning is adopted widely in the heritage sector in many spread-out experiments that target the specifics of individual collections, remarking how machine learning is now ubiquitous in memory institutions. Binmakhashen twenty twenty is an example of a paper that surveys how machine learning can influence archival practices via the technical area of (automated) document processing, yet without taking a recordkeeping viewpoint. More closely related to our interests is Marciano twenty eighteen. The authors start from the same assumption that archival practice will be transformed by new advanced digital methodologies such as machine learning. They go through a range of case studies to describe a new interdiscipline at the intersection of archival and computer science and make detailed suggestions for changes to archival education.

Our contribution is organised as follows: We start by clarifying the scope of the survey and our methodology to assemble related works. We discuss the literature, organising it in four broad thematic categories, which we identified. Finally, we critically discuss these trends and underline what future opportunities we see for this area of study.

Two SCOPE OF THE SURVEY

Two point one Methodology

Three A SURVEY OF ARCHIVES AND AI

Three point one Theoretical and Professional Considerations

Three point two Automating Recordkeeping Processes and Decisions

Three point three Appraisal

Three point four. Handling Sensitive Information

Three point five. Metadata

Four. Organising and Accessing Archives

Archives and AI: An Overview of Current Debates and Future Perspectives. Four: seven

Four point one. Automatic Content Extraction and Indexation

Four point two. Distant Reading Archival Records

Four point three. Search and Retrieval

Five. Novel Forms of Digital Archives

Five point one. Emerging Trends

Six. Discussion

Seven CONCLUSIONS

Overview

The article provides an overview of current literature on the application of artificial intelligence in archival practices, organized through the Records Continuum model. It highlights emerging trends and implications for future research and practice in the field.

Key Points

1Archives are undergoing a digital transformation influenced by AI and machine learning
2The use of AI is shifting traditional archival processes towards automation
3The Records Continuum model helps understand AI's impact on recordkeeping
4Future opportunities include integrating AI principles into archival systems
5The document surveys 53 relevant references from recent literature on AI in archives

Details

Authors: Giovanni Colavizza, Tobias Blanke, Charles Jeurgens, Julia Noordegraaf
Category: Technology and Engineering

PDF
KarGO: A Smarter Mobile Platform for Tricycle Transportation
KarGO is a mobile platform designed to optimize tricycle transportation in the Philippines, making it easier for users to book rides and helping registered drivers find more passengers, while ensuring safety and convenience through technology.
PDF
KarGO: A Smarter Transportation Solution for Tricycles
This document introduces KarGO, a mobile platform designed to improve the tricycle transportation experience for passengers and drivers in the Philippines. It outlines how users can book rides or deliveries and emphasizes the convenience and safety features of the app.
PDF
KarGO: A Smarter Way to Move Your Community
KarGO is a mobile platform designed to improve transportation for passengers and tricycle drivers in the Philippines, allowing users to book rides, track trips in real-time, and utilize cashless payments.
PDF
Introducing KarGO: A Smarter Transportation Solution for Tricyle Services
KarGO is a mobile platform designed to streamline tricycle transportation in the Philippines, allowing passengers to easily book rides and drivers to find more opportunities. The platform enhances safety for school transportation with real-time GPS tracking and facilitates cashless transactions.
PDF
Cognitive Edge Computing: A Comprehensive Survey on Optimizing Large Models and AI Agents for Pervasive Deployment
This comprehensive survey explores Cognitive Edge Computing as a methodology for deploying advanced AI models and agents on resource-constrained edge devices. It examines model optimization, system architecture, and adaptive intelligence necessary for effective cognitive processing in such environments.