Ancient AI
Translating Yesterday’s Ink with Tomorrow’s Tech
Gerardo Hernandez; Janel Michaela Joson; Keiko Raiola
University of Hawai’i at Mānoa
Information and Computer Sciences
POST Building, Rm 317
1680 East-West Road
Honolulu, HI 96822 USA
Office: 808.956.7420
Fax: 808.956.3548
Email: icsinfo@hawaii.edu
Contact
References
This project explored the integration of artificial intelligence (AI) into the transcription process of historical handwritten documents.
The approach utilized Transkribus, an established AI powered platform for document transcription, and a locally hosted database for searching through a span of documents effectively and efficiently.
A key component to our project was the development of a custom Transkribus model, trained in transcribing documents specific to the Sponsor’s field of research: handwritten documents from the 19th century Marshall Islands. �
Overall, this project demonstrates the potential of use of AI in interdisciplinary collaboration: merging computer science, machine learning and digital humanities to innovate within academic research and historical preservation.
Abstract
We utilized various different methods to accomplish transcribing handwritten documents. One of the requirements to train a custom model on the Transkribus platform is to have fully transcribed texts. Noticeable differences in accuracy occur when the model is trained to at least 10,000 words. These documents were largely transcribed manually.
Overall, we utilized pre-existing AI models in Transkribus to help generate the initial OCR. Then, the team used a combination of manual and AI-assisted tools to fully prepare the reference documents that would allow us to train our own custom model.
The data below shows the average accuracy when transcribing 10 pages of hand-written text of various methods used in our project.
.
Introduction
In order to create a web application with a custom AI model to help transcribe 19th century Marshall Island documents, the custom AI model needed to be created first. Utilizing the Transkribus AI platform [1], the custom AI model was built using their public AI model as the foundation. The model also needed training data before it can be created which were 100% transcribed documents provided by our sponsor and now the custom AI model was ready for use. The last step was to continuously refine the model by using more transcribed documents to improve its accuracy. To assist with the refining process, another custom AI was created utilizing OpenAI that help fix textual errors found in the Transkribus transcription output. This OpenAI model will only correct spelling errors found in the transcription and will keep the integrity of the grammar and sentence structure.
For the web application, Meteor with Bootstrap 5 React was utilized [2]. In order to use the custom AI model in the web application, integration of Transkribus API was needed. After implementation of the API, the intended features were implemented as well.
The materials/resources used include photo copies of archival documents, with focus on authors from the Marshall Islands provided by our sponsor, Transkribus public AI models and software for creating the custom AI model, OpenAI for text correcting transcriptions given by the custom AI model, and Meteor with Bootstrap 5 React for developing the web application.
Methods and Materials
Click here to insert your Discussion text. Type it in or copy and paste from your Word document or other source.
This text box will automatically resize to your text. To turn off that feature, right click inside this box and go to Format Shape, Text Box, Autofit, and select the “Do Not Autofit” radio button.
To change the font style of this text box: Click on the border once to highlight the entire text box, then select a different font or font size that suits you. This text is Calibri 32pt and is easily read up to 4 feet away on a 48x36 poster.
Zoom out to 100% to preview what this will look like on your printed poster.
Discussion
In conclusion, this project explored AI in the field of historical research and digital archiving. By developing a custom AI model utilizing Transkribus trained on Marshallese handwritten documents and integrating it into a web application, there is more potential for an accessible archival database for historians and researchers to find information.
Conclusion
Results
Chart 1. Accuracy Scores of various document transcription methods. Accuracy in %
Figure 1. Scanning archival documents.
Handwritten letters have long served as a powerful means of long-distance communication, offering deep insights into personal, cultural, and historical contexts. In particular, cursive scripts offer a valuable window into the past, illustrating the evolution of written language over time. Our sponsor has provided a large collection of handwritten correspondence exchanged between missionaries in the Marshall Islands during the 1800s. However, reading and interpreting each letter manually is a time-consuming and often challenging task.
To address this, our project aims to develop a streamlined, user-friendly web application that transcribes historical handwritten documents into a searchable text database. The application will support keyword searches and include tagging features to help organize documents by author, time period, and location.
This project highlights the importance of preserving and providing easy access to historical documents for research purposes. Deciphering older handwritten texts can be particularly challenging due to changes in cursive styles, writing conventions, and cultural context over time. Even with historical knowledge, the dense and stylized handwriting from the 1800s can make interpretation slow and labor-intensive.
By converting these handwritten letters into a legible, searchable digital format, we aim to make this valuable archive more accessible to researchers and historians. Ultimately, this tool will help streamline the research process, allowing for quicker navigation, comprehension, and analysis of these historically significant texts.
Figure 2. Transcribing documents with AI software.
Figure 3. Transcriptions output into text.
Figure 4. Transcriptions saved into web application.
Sponsor: Dr. Monica LaBriola
Assistant Professor of History