1 of 45

Introduction to TEI text structure

Textual Editing in theDigital Age

Tiago Sousa Garcia

tiago.sousa-garcia@newcastle.ac.uk

2 of 45

Workshop Plug: Introduction to Stylometry

  • Are you interested in computationally analysing texts?
  • Want to trace authorial style across texts?
  • Explore the use of R and Gephi? (No experience necessary!)
  • Find out about the problems and possibilities of Stylometry?
  • Free workshop: 11-12 April 2019, open to all
  • Run by one of the inventors of the ‘Stylo’ package for R
  • https://research.ncl.ac.uk/atnu/news/introductiontostylometryworkshop.html

Textual Editing in the Digital Age

3 of 45

This lecture

  • How is a TEI document structured?
  • What information does each section contain?
  • How do you create a well structured TEI document?

Textual Editing in the Digital Age

4 of 45

Why do TEI documents have a structure?

  • XML is a structured format
  • Structure helps people and computers navigate the information
  • Text is already structured! We are just making it explicit
  • Structure allows the TEI to be generalised

Textual Editing in the Digital Age

5 of 45

Why do TEI documents have a structure?

  • The TEI has a generalist approach, meaning it copes with texts
    • of any size
    • in any language and writing system
    • of any complexity
    • on all media
    • from any time and place
  • Examples?
    • Books
    • Journals
    • Manuscripts
    • Rolls
    • Coins
    • Inscription tablets
    • ...

Textual Editing in the Digital Age

6 of 45

WIP TEI structure

Textual Editing in the Digital Age

Let’s build a TEI document as we go along

  • We need a <TEI> element
  • But what else?

7 of 45

What is the structure of a TEI document?

Data

Textual Editing in the Digital Age

<TEI>

<teiHeader>

<text>

<facsimile>

<sourceDoc>

<body>

<front>

<back>

<fileDesc>

<titleStmt>

<publicationStmt>

<sourceDesc>

Metadata

8 of 45

TEI structure

Textual Editing in the Digital Age

  • <TEI> contains all elements of a TEI document
  • <teiHeader> is required in all cases
  • At least one of <facsimile>, <sourceDoc> or <text> required in all cases

9 of 45

TEI structure

Textual Editing in the Digital Age

What if I don’t have a single document but a series of documents gathered together?

  • A collection of essays
  • Anthologies
  • Gatherings of manuscripts

10 of 45

TEI structure

Textual Editing in the Digital Age

<TEI>

<teiHeader>

<text>

<facsimile>

<sourceDoc>

<body>

<front>

<back>

<fileDesc>

<titleStmt>

<publicationStmt>

<sourceDesc>

<TEI>

<teiHeader>

<text>

<facsimile>

<sourceDoc>

<body>

<front>

<back>

<fileDesc>

<titleStmt>

<publicationStmt>

<sourceDesc>

<teiCorpus>

...

11 of 45

TEI structure

Textual Editing in the Digital Age

  • <teiCorpus> contains multiple <TEI> elements
  • <teiCorpus> includes one <teiHeader> for the whole corpus
  • Each <TEI> element carries the regular structure
    • <teiHeader>
    • <facsimile> | <sourceDoc> | <text>

12 of 45

What is the structure of a TEI document?

Textual Editing in the Digital Age

<TEI>

<teiHeader>

<text>

<facsimile>

<sourceDoc>

<body>

<front>

<back>

<fileDesc>

<titleStmt>

<publicationStmt>

<sourceDesc>

13 of 45

<teiHeader>

What is it?

  • Metadata about both the TEI document and the source document
  • Describes and identifies the source and encoding
  • Akin to the title page of a published book
    • (but not meant to be the encoding of a title page)
    • (and also like the digital catalogue page about the book)
    • (and also so much more…)

Textual Editing in the Digital Age

14 of 45

<teiHeader>

Textual Editing in the Digital Age

  • Must contain <fileDesc>
  • May contain other elements to describe:
    • encoding
    • text profile
    • non-TEI metadata
    • revisions

15 of 45

WIP TEI structure

Textual Editing in the Digital Age

  • Added <teiHeader> element
  • Added <fileDesc> element

16 of 45

<fileDesc>

What is it?

  • Contains bibliographic description of the electronic file, including:
    • Title
    • Publication
    • Source
    • And much more!

Textual Editing in the Digital Age

17 of 45

<fileDesc>

Textual Editing in the Digital Age

  • Must contain:
    • <titleStmt>
    • <publicationStmt>
    • <sourceDesc>
  • May contain other elements to describe:
    • edition
    • extent
    • series
    • notes

18 of 45

What is the structure of a <teiHeader>?

Textual Editing in the Digital Age

<teiHeader>

<fileDesc>

<encodingDesc>

<profileDesc>

<titleStmt>

<publicationStmt>

<sourceDesc>

<xenoData>

<revisionDesc>

<editionStmt>

<extent>

<seriesStmt>

<notesStmt>

(more here)

(more here)

(more here)

(more here)

19 of 45

<encodingDesc>

  • Groups notes about the procedures used when the text was encoded in <p> or specific elements such as:
  • <projectDesc>: goals of the project
  • <samplingDecl>: sampling principles
  • <editorialDecl>: editorial principals,
    • e.g. <correction>, <hyphenation>, <interpretation>, <normalization>, <punctuation>, <quotation>, <segmentation>
  • <classDecl>: classification system/s
  • <tagsDecl>: specifics about element usage or rendition

Textual Editing in the Digital Age

20 of 45

What is the structure of a <teiHeader>?

Textual Editing in the Digital Age

<teiHeader>

<fileDesc>

<encodingDesc>

<profileDesc>

<titleStmt>

<publicationStmt>

<sourceDesc>

<xenoData>

<revisionDesc>

<editionStmt>

<extent>

<seriesStmt>

<notesStmt>

(more here)

(more here)

(more here)

(more here)

21 of 45

<profileDesc>

  • <creation>: the creation of the text
  • <langUsage>: languages, registers, writing systems
  • <textDesc> and <textClass>: classifications applied to the text
  • <particDesc> and <settingDesc>: details of ‘participants’, either real or depicted
  • <handNotes>: hands distinguished within a manuscript when not giving full manuscript description

Textual Editing in the Digital Age

22 of 45

What is the structure of a <teiHeader>?

Textual Editing in the Digital Age

<teiHeader>

<fileDesc>

<encodingDesc>

<profileDesc>

<titleStmt>

<publicationStmt>

<sourceDesc>

<xenoData>

<revisionDesc>

<editionStmt>

<extent>

<seriesStmt>

<notesStmt>

(more here)

(more here)

(more here)

(more here)

23 of 45

<revsionDesc>

  • Contains list of <change> elements with @date and @who attributes documenting significant stages in the history of the digital document
  • Conventionally, the most recent change is given first.
  • Can use a <listChange>
  • Can be maintained manually, or could done by means of a version control system (like Git)

Textual Editing in the Digital Age

24 of 45

WIP TEI structure

Textual Editing in the Digital Age

  • Added required child elements to <fileDesc>:
    • <titleStmt>
    • <publicationStmt>
    • <sourceDesc>
  • Populated elements with bare minimum

25 of 45

What is the structure of a TEI document?

Textual Editing in the Digital Age

<TEI>

<teiHeader>

<text>

<facsimile>

<sourceDoc>

<body>

<front>

<back>

<fileDesc>

<titleStmt>

<publicationStmt>

<sourceDesc>

26 of 45

WIP TEI structure

Textual Editing in the Digital Age

  • Added <text> element

27 of 45

<text>

What is it?

  • Contains a single text of any kind
  • May also contain paratextual material (front and back matter)
  • Can be:
    • Unitary — for example, a novel
    • Composite — say, a volume of poetry
  • If the <teiHeader> is metadata, <text> is data

Textual Editing in the Digital Age

28 of 45

<text>

Textual Editing in the Digital Age

  • Must contain <body>
  • May contain other elements to describe:
    • front matter
    • back matter

29 of 45

Composite <text>

  • Much like a many TEI documents can be gathered by a <teiCorpus>, many <texts> can be gathered by a <group> of texts:

Textual Editing in the Digital Age

<text>

<body>

<front>

<back>

<text>

<body>

<front>

<back>

<text>

<body>

<front>

<back>

<group>

<text>

<front>

<back>

30 of 45

Composite <text>

Textual Editing in the Digital Age

  • Root <text> contains one or more <group> elements
  • Each <group> contains one or more child <text> elements
  • Each child <text> element must contain at least a <body> element

31 of 45

Inside the <text>: <front>, <body>, and <back>

Textual Editing in the Digital Age

<front>

<back>

<body>

text

paratext

32 of 45

<front>

What is it?

  • Contains paratextual front matter
    • Cover
    • Title page
    • Acknowledgements
    • Dedications
    • Abstracts
    • Table of Contents
    • Epigraphs
    • Frontispieces

Textual Editing in the Digital Age

33 of 45

<front>

Textual Editing in the Digital Age

34 of 45

<back>

What is it?

  • Contains paratextual back matter
    • Appendix
    • Glossary
    • Notes
    • Index
  • Because of differing cultural conventions, the models for front and back matter are the same

Textual Editing in the Digital Age

35 of 45

<back>

Textual Editing in the Digital Age

Of course there are elements that allow you to mark index entries in the body of the text and auto-generate a very detailed index

36 of 45

<body>

What is it?

  • Contains the text itself
  • Text can be subdivided into multiple sections (<div>)
  • Each subdivision can be further divided (nested <div>)
  • Use of @type attribute to distinguish between different <div>

Textual Editing in the Digital Age

37 of 45

<body>

Textual Editing in the Digital Age

  • <body> can contain multiple <div>
  • <div> can contain:
    • Headings (<head>)
    • Prose (<p>)
    • Poetry (<l>)
    • Drama (<sp>)

38 of 45

<body>

Textual Editing in the Digital Age

  • The @type attribute distinguishes between different types of divisions:
    • Epic, Bible —> book
    • Novel —> chapter
    • Report —> section
    • Drama —> acts, scenes
    • Diary —> entries

39 of 45

<body>

Textual Editing in the Digital Age

  • The @n attribute distinguishes between different sections of the same type

40 of 45

Global attributes

What is it?

  • Certain attributes can apply to (potentially) anything
  • Attributes of the att.Global class can appear in every element:
    • @xml:id —> unique identifier for any element
    • @n —> name or number for any element (not unique)
    • @xml:lang —> ISO code for language of any element
    • @xml:space —> specifies whitespace management
    • @rend, @style and @rendition —> specify how an element should be rendered (what it should look like)
    • @resp —> attributes responsibility to an agent
    • @cert —> specifies the degree of certainty for an element

Textual Editing in the Digital Age

41 of 45

Inside the <text>: <front>, <body>, and <back>

Textual Editing in the Digital Age

<front>

<back>

<body>

42 of 45

WIP TEI structure

Textual Editing in the Digital Age

  • Added <body> element
  • Added <p> element

Our TEI structure is now complete!

43 of 45

To recap:

  • Each TEI document needs at least one <TEI> element
  • A <TEI> element needs both:
    • metadata —> <teiHeader>
    • data —> <text> (or <facsimile> or <sourceDoc>)
  • A <teiHeader> needs at least:
    • <fileDesc> which must contain at least:
      • <titleStmt>
      • <publicationStmt>
      • <sourceDesc>
  • A <text> element needs at least:
    • A <body>, which cannot be empty

Textual Editing in the Digital Age

44 of 45

I want to know more!

Textual Editing in the Digital Age

45 of 45

Let’s practice!