Archives and AI: An Overview of Current Debates and Future Perspectives
Archives and AI: An Overview of Current Debates and Future Perspectives
The digital transformation is turning archives, both old and new, into data. As a consequence, automation in the form of artificial intelligence techniques is increasingly applied both to scale traditional recordkeeping activities, and to experiment with novel ways to capture, organise, and access records. We survey recent developments at the intersection of Artificial Intelligence and archival thinking and practice. Our overview of this growing body of literature is organised through the lenses of the Records Continuum model. We find four broad themes in the literature on archives and artificial intelligence: theoretical and professional considerations, the automation of recordkeeping processes, organising and accessing archives, and novel forms of digital archives. We conclude by underlining emerging trends and directions for future work, which include the application of recordkeeping principles to the very data and processes that power modern artificial intelligence and a more structural-yet critically aware-integration of artificial intelligence into archival systems and practice.
One INTRODUCTION
One INTRODUCTION
Long before big data as an idea had been invented, archives already measured their collections in kilometers of files and folders. Worldwide large-scale digitisation efforts have by now transformed at least some of these collections into digital data. Next to these, from the nineteen nineties onwards, governments and other institutions with archival interests have increasingly worked digitally. This change did not immediately lead to a transformation of archival practice and workflows. The archival process still remained defined largely by manual appraisal, selection, and review as long as the size of the collections and records still allowed this. However, the time window of this shift is closing fast. More and more archival collections are digitised and new born-digital records at ever larger scale are being submitted to archives. This makes a manual archival process less and less feasible. At the same time, records still need to be evaluated to ensure quality and trust, which provide the foundations of archives. Consequently, human archivists need the support of machine agents to assist them working through archival big data. The role of archivists is thus transformed, as they need to learn to make use of machine reasoning for appraisal and selection and to assess the assessments of machines. The archive becomes a big data organisation and like all big data organisations needs to at least partly put its trust into Artificial Intelligence, mostly in the form of machine learning, to deal with the transformation.
Across the world, archives acquire AI capacities to organise their workflows around the big data they have as well as to offer their big data to outside organisations. Ten years ago AI activities in archives were still largely experiments that showcased a potential, as they offered new ways of working with specific parts of the archival holdings like digitised newspaper collections. This has changed. We have recently observed a new trend where AI is used throughout the recordkeeping processes that characterise archives. In this article, we survey recent research at the intersection of AI and archival processes, using the lenses of the Records Continuum model.
To our knowledge, there is no such survey of the relationship between AI and archives. Romein twenty twenty provides a detailed overview for the related field of digital history, while Fiorucci twenty twenty contributes a closely related survey for cultural heritage in general. While neither survey considers the records' continuum that defines archives, they do observe that machine learning is adopted widely in the heritage sector in many spread-out experiments that target the specifics of individual collections, remarking how machine learning is now ubiquitous in memory institutions. Binmakhashen twenty twenty is an example of a paper that surveys how machine learning can influence archival practices via the technical area of (automated) document processing, yet without taking a recordkeeping viewpoint. More closely related to our interests is Marciano twenty eighteen. The authors start from the same assumption that archival practice will be transformed by new advanced digital methodologies such as machine learning. They go through a range of case studies to describe a new interdiscipline at the intersection of archival and computer science and make detailed suggestions for changes to archival education.
Our contribution is organised as follows: We start by clarifying the scope of the survey and our methodology to assemble related works. We discuss the literature, organising it in four broad thematic categories, which we identified. Finally, we critically discuss these trends and underline what future opportunities we see for this area of study.