Digital Annotation and Analysis of Literary Texts with CATMA 6

Aims of the Workshop

The Workshop introduces students of literature to CATMA 6 (Computer Assisted Text Markup and Analysis; https://catma.de), an open source tool developed at and hosted by the University of Hamburg since 2008. CATMA is currently used by over 60 research projects and approx. 10.000 users worldwide. Its new sixth version forms part of the DFG-funded forTEXT project (https://fortext.net) and offers a unique combination of three main features:

CATMA supports collaborative annotation and analysis – a text or text corpus can be investigated individually, but also jointly by agroup of students or researchers.
CATMA supports explorative, non-deterministic practices oftext annotation – a discursive, debate-oriented approach to text annotationbased on the research practices of hermeneutic disciplines is the underlyingconceptual model.
CATMA integrates text annotation and textanalysis in a web-based working environment – which makes it possible tocombine the identification of textual phenomena with their investigation in aseamless, iterative fashion.

What sets CATMA apart from other digital annotation methods is its ‘undogmatic’ approach: the system does neither prescribe defined annotation schemata or rules, nor does it force the user to apply rigid yes / no, right / wrong taxonomies to texts (even though it allows for more prescriptive schemata as well). In other words, CATMA’s logic invites users to explore the richness and multifacetedness of textual phenomena according to their needs: users can create, expand, and continuously modify their own individual tagsets – so if a text passage invites more than one interpretation, nothing in the system prevents assigning multiple, or even contradictory annotations.

Despite its flexibility, CATMA does not produce idiosyncratic annotations: all markup data can be exported in TEI/XML-format and reused in other contexts.

Since CATMA is a highly intuitive tool it is particularly suitable for humanists with little technical knowledge: the graphical user interface allows for a quick kick-off, and CATMA’s build query function (a step-by-step dialogue-based widget) helps users retrieve complex information from texts without having to learn a query language. Moreover, CATMA’s easy-to-use automated distant-reading functions are continuously enhanced and extended.

In our workshop we will introduce the core annotation and analysis functionalities of CATMA and show how they can be combined with the annotations provided automatically. In week 1, participants will be taken in a step-by-step, hands-on approach through the full cycle of a CATMA-based text investigation and can work on their own texts / projects:

From text upload to initial text investigations,
then to annotation and specification of annotation categories,
from there to combined text queries that consult the source text and its annotations in combination,
and finally to the visual output of query results.

Participants wil lbe able to test and apply the tool hands-on: they will annotate their own texts, create their own Tagsets, and define their tags in an annotation guideline. We would also like to engage participants in a critique of CATMA’s design and components as well as a general discussion about requirements for text analysis tools in their fields of interest.

In week 2 we will combine the work in CATMA with other methods and tools for digital text analysis like NER and (S)NA in two steps. We will begin with the visual investigation and refinement of annotations created in week 1. Second, we will focus on the application of CATMA to the individual projects of the workshop participants: what is the outcome of the CATMA based annotation and analysis of texts as well as of the creation of genuine Tagsets relevant to these projects? Each participant will give a short presentation on their project, followed by a group discussion.

Target Audience of the Workshop

The primary users of CATMA are literary scholars, as well as graduate and undergraduate students in Literary Studies. In addition, this workshop is likely to be of interest to

humanities scholars in all fields concerned with text analysis (with and without experience in digital text analysis),
software developers in the humanities interested in non-deterministic text analysis and automated annotation.

Participants need no prior knowledge of digital text annotation and can work with their own laptop computers and their own digital texts.CATMA runs on Laptop or PC (Windows, Unix or MacOS) with a current web browser (Edge, Firefox, Chrome, Safari) with a mouse or touchpad. Touchscreen navigation is not yet supported.

Schedule

Day	Minute 1–45	Minute 45–90	Minute 90–135	Minute 135–180
1	CATMA Concept	CATMA Demo	Project Presentations 1	Project Presentations 1
2	Project Presentations 2	Project Presentations 2	Annotation	Annotation
3	Tagset Creation	Tagset Creation	Guidelines	Tag Definitions
4	Analyze	Analyze Text & Annotations	Visualize	Visualize Text and Annotations
5	Corpus	Analyze Corpus	Automatization	Synthesis

Week1 CATMA

Day1

CATMA Concept
1. undogmatic; hermeneutic; for literary studies
2. distant, close and scalable reading
CATMA Demo
1. Introduction of CATMA’s architecture and exemplary workflow
2. general functions
Project presentations of participants 1

Day2

Project presentations of participants 2
Annotate your own text

Day3

How to create a Tagset and presentation of existing Tagsets
Create your own Tagset
The use of annotation guidelines for collaborative annotation -> interannotator agreement and interannotator disagreement
How could the tags in your project be defined?

Day4

Demonstration of the Analyze module and its functions
Analyze your text and the annotations you created
Demonstration of the Visualize module and its functions
Visualize your text and the annotations you created

Day5

Corpus and Corpus functions in CATMA
Analyze and visualize your own corpus
Automatization functions in CATMA
What is needed to automatize the annotation in your project?

Week 2 CATMA plus

Day	Minute 1–45	Minute 45–90	Minute 90–135	Minute 135–180
1	3DH prototype	Analyze visually	3DH postulates	Diverse Visualizations
2	NER	Stanford NER	Import/export	NER & CATMA
2&3	Refine annotations	Refine annotations	Network Analysis	Network Analysis
4	Preparation of presentation in small method related groups	Preparation of presentation in small method related groups	presentation of workshop results and individual feedback	presentation of workshop results and individual feedback
5	Backlog	Synthesis	Feedback	Wishlist

Day 1

How to explore and refine annotations with the 3DH prototype
Analyze your annotations visually in 3DH
What a digital visualization environment should be able to do – the four 3DH postulates (2 way screen, parallax, qualitative, and discursive)
Try out diverse interactive visualizations in 3DH

Day 2 (3 x 1 ½ hours)

Named Entity Recognition as method
Use the Stanford Named Entity Recognizer with your own text
How to export data from Stanford NER and import them into CATMA
Evaluate and Refine the automatically generated NER annotations in CATMA and train your NER model
How can the 3DH visualizations and NER annotations help to refine the annotations in your project?

Day 3 (1 ½ hours)

How (social) network analysis can be a postprocessing of CATMA generated data
Create your own semantic networks in Gephi

Day 4

Preparation of method related presentations of results in small groups
Presentations: (How) did the workshop help your research project?
individual feedback

Day 5

Backlog for further questions and details
How do the different digital methods relate to one another and why is it fruitful to combine them? What are the challenges for digital tools with regards to literary texts?
Feedback and evaluation of the workshop
Wishlist for further methods or tools to discuss (or to develop)

Deutsch

Culture & Technology

European Summer University in Digital Humanities

Digital Annotation and Analysis of Literary Texts with CATMA 6