Digital Annotation and Analysis of Literary Texts with CATMA 6

Aims of the Workshop

The Workshop introduces students of literature to CATMA 6 (Computer Assisted Text Markup and Analysis; https://catma.de), an open source tool developed at and hosted by the University of Hamburg since 2008. CATMA is currently used by over 60 research projects and approx. 10.000 users worldwide. Its new sixth version forms part of the DFG-funded forTEXT project (https://fortext.net) and offers a unique combination of three main features:

  1. CATMA supports collaborative annotation and analysis – a text or text corpus can be investigated individually, but also jointly by agroup of students or researchers.
  2. CATMA supports explorative, non-deterministic practices oftext annotation – a discursive, debate-oriented approach to text annotationbased on the research practices of hermeneutic disciplines is the underlyingconceptual model.
  3. CATMA integrates text annotation and textanalysis in a web-based working environment – which makes it possible tocombine the identification of textual phenomena with their investigation in aseamless, iterative fashion.

What sets CATMA apart from other digital annotation methods is its ‘undogmatic’ approach: the system does neither prescribe defined annotation schemata or rules, nor does it force the user to apply rigid yes / no, right / wrong taxonomies to texts (even though it allows for more prescriptive schemata as well). In other words, CATMA’s logic invites users to explore the richness and multifacetedness of textual phenomena according to their needs: users can create, expand, and continuously modify their own individual tagsets – so if a text passage invites more than one interpretation, nothing in the system prevents assigning multiple, or even contradictory annotations.

Despite its flexibility, CATMA does not produce idiosyncratic annotations: all markup data can be exported in TEI/XML-format and reused in other contexts.

Since CATMA is a highly intuitive tool it is particularly suitable for humanists with little technical knowledge: the graphical user interface allows for a quick kick-off, and CATMA’s build query function (a step-by-step dialogue-based widget) helps users retrieve complex information from texts without having to learn a query language. Moreover, CATMA’s easy-to-use automated distant-reading functions are continuously enhanced and extended.

In our workshop we will introduce the core annotation and analysis functionalities of CATMA and show how they can be combined with the annotations provided automatically. In week 1, participants will be taken in a step-by-step, hands-on approach through the full cycle of a CATMA-based text investigation and can work on their own texts / projects:

  1. From text upload to initial text investigations,
  2. then to annotation and specification of annotation categories,
  3. from there to combined text queries that consult the source text and its annotations in combination,
  4. and finally to the visual output of query results.

Participants wil lbe able to test and apply the tool hands-on: they will annotate their own texts, create their own Tagsets, and define their tags in an annotation guideline. We would also like to engage participants in a critique of CATMA’s design and components as well as a general discussion about requirements for text analysis tools in their fields of interest.

In week 2 we will combine the work in CATMA with other methods and tools for digital text analysis like NER and (S)NA in two steps. We will begin with the visual investigation and refinement of annotations created in week 1. Second, we will focus on the application of CATMA to the individual projects of the workshop participants: what is the outcome of the CATMA based annotation and analysis of texts as well as of the creation of genuine Tagsets relevant to these projects? Each participant will give a short presentation on their project, followed by a group discussion.

Target Audience of the Workshop

The primary users of CATMA are literary scholars, as well as graduate and undergraduate students in Literary Studies. In addition, this workshop is likely to be of interest to

  1. humanities scholars in all fields concerned with text analysis (with and without experience in digital text analysis),
  2. software developers in the humanities interested in non-deterministic text analysis and automated annotation.

Participants need no prior knowledge of digital text annotation and can work with their own laptop computers and their own digital texts.CATMA runs on Laptop or PC (Windows, Unix or MacOS) with a current web browser (Edge, Firefox, Chrome, Safari) with a mouse or touchpad. Touchscreen navigation is not yet supported.

Schedule

Day Minute 1–45Minute 45–90Minute 90–135Minute 135–180
1CATMA ConceptCATMA DemoProject Presentations 1Project Presentations 1
2Project Presentations 2Project Presentations 2AnnotationAnnotation
3Tagset CreationTagset CreationGuidelinesTag Definitions
4AnalyzeAnalyze Text & AnnotationsVisualizeVisualize Text and Annotations
5CorpusAnalyze CorpusAutomatizationSynthesis

Week1 CATMA

Day1

  1. CATMA Concept
    1. undogmatic; hermeneutic; for literary studies
    2. distant, close and scalable reading
  2. CATMA Demo
    1. Introduction of CATMA’s architecture and exemplary workflow
    2. general functions
  3. Project presentations of participants 1

Day2

  1. Project presentations of participants 2
  2. Annotate your own text

Day3

  1. How to create a Tagset and presentation of existing Tagsets
  2. Create your own Tagset
  3. The use of annotation guidelines for collaborative annotation -> interannotator agreement and interannotator disagreement
  4. How could the tags in your project be defined?

Day4

  1. Demonstration of the Analyze module and its functions
  2. Analyze your text and the annotations you created
  3. Demonstration of the Visualize module and its functions
  4. Visualize your text and the annotations you created

Day5

  1. Corpus and Corpus functions in CATMA
  2. Analyze and visualize your own corpus
  3. Automatization functions in CATMA
  4. What is needed to automatize the annotation in your project?

Week 2 CATMA plus

DayMinute 1–45Minute 45–90Minute 90–135Minute 135–180
13DH prototypeAnalyze visually3DH postulatesDiverse Visualizations
2NERStanford NERImport/exportNER & CATMA
2&3Refine annotationsRefine annotationsNetwork AnalysisNetwork Analysis
4Preparation of presentation in small method related groupsPreparation of presentation in small method related groupspresentation of workshop results and individual feedbackpresentation of workshop results and individual feedback
5BacklogSynthesisFeedbackWishlist

Day 1

  1. How to explore and refine annotations with the 3DH prototype
  2. Analyze your annotations visually in 3DH
  3. What a digital visualization environment should be able to do – the four 3DH postulates (2 way screen, parallax, qualitative, and discursive)
  4. Try out diverse interactive visualizations in 3DH

Day 2 (3 x 1 ½ hours)

  1. Named Entity Recognition as method
  2. Use the Stanford Named Entity Recognizer with your own text
  3. How to export data from Stanford NER and import them into CATMA
  4. Evaluate and Refine the automatically generated NER annotations in CATMA and train your NER model
  5. How can the 3DH visualizations and NER annotations help to refine the annotations in your project?

Day 3 (1 ½ hours)

  1. How (social) network analysis can be a postprocessing of CATMA generated data
  2. Create your own semantic networks in Gephi

Day 4

  1. Preparation of method related presentations of results in small groups
  2. Presentations: (How) did the workshop help your research project?
  3. individual feedback

Day 5

  1. Backlog for further questions and details
  2. How do the different digital methods relate to one another and why is it fruitful to combine them? What are the challenges for digital tools with regards to literary texts?
  3. Feedback and evaluation of the workshop
  4. Wishlist for further methods or tools to discuss (or to develop)