Image Processing and Machine Learning for the Digital Humanities

While text is still the most important research topic in the digital humanities, over the past ten years images have started to gradually appear on the radar of computational humanists. Recent developments in digital art history in particular have shown that the importance of images for DH research goes beyond ensuring their accessibility through databases and interfaces. In fact, images are where digital humanities and “artificial intelligence” meet. Most importantly, the automated classification of images on the one hand, and the automated production of images on the other raise fundamental questions at the interface of computer science and the humanities: how is reality represented in machine learning systems, and how and why do human-held preconceptions, biases, and misjudgements enter such systems?

In the workshop, we will tackle these important questions from two directions.

The first week of the workshop will serve as an introduction to image processing for the digital humanities. Starting from scratch (what is a digital image?) we will gradually explore image processing strategies, i.e. theoretical approaches and practical implementations, that are useful for DH applications, including: scraping (building large-scale image datasets from Web sources), batch processing („cleaning“ image datasets and adapting them to the affordances of a machine), feature extraction (extracting semantic information from image datasets), clustering (visually sorting and reviewing image datasets), and classification (analyzing image datasets using pre-trained machine learning systems). Participants are encouraged to try some of these strategies on a provided practice dataset but eventually work on their on image corpora or image corpora ideas from all areas of visual studies, including but not limited to cultural heritage, historical image data, museum and archival collections, artworks, etc..

The second week of the workshop will be dedicated to a historical and philosophical critique of image processing strategies and machine learning applications, particularly as DH tools. We will read and discuss recent developments like facial recognition and research results investigating these developments from areas such as FAT-ML (fairness, accountability, and transparency of machine learning), digital art history, media studies, and science and technology studies.

The workshop requires a willingness to pick up basic concepts of the Python programming language during the workshop. Previous computer programming experience is beneficial.