1 of 33

Exploring Song Lyrics through Digital Text Analysis Tools

Xianzhong Meng

T Cruz

2025-10-30

2 of 33

CONTENTS

  1. Learning Objective
  2. Workshop Overview
  3. T’s Session
  4. Xianzhong’s session

3 of 33

  1. Learning Objective

4 of 33

  1. Learning Objective
  • Understand the basic functions of LancsBox for corpus analysis
  • Learn how to import and explore song lyrics using LancsBox tools (Text, KWIC, MATTR)
  • Use LancsBox tools to analyze activist language in songs
  • Analyze lexical diversity and word patterns through quantitative data
  • Compare corpus results with visual representations (graphs, frequency lists, collocation networks)

5 of 33

2. Workshop Overview

6 of 33

2.1 Workshop Overview

  • What is the Lancsbox?
    • #LancsBox is a new-generation software package for the analysis of language data and corpora developed at Lancaster University
  • Features of Lancsbox: (http://corpora.lancs.ac.uk/lancsbox/index.php)
    • Works with your own data or existing corpora
    • Can be used by linguists, language teachers, historians, sociologists, educators, and anyone interested in language.
    • Visualizes language data
    • Analyses data in any language
    • Find out more details about language support.
    • Automatically annotates data for part-of-speech
    • Works with any major operating system (Windows, Mac, Linux)
    • How to download Lancsbox? http://corpora.lancs.ac.uk/lancsbox/index.php (Current version: 5.5.1)

7 of 33

2.1 Workshop Overview

    • Panel of Lancsbox
    • Key Functions of Lancsbox
      • KWIC (key word in context) ⬆️
      • Words ⬇️
      • Text ⬅️
      • GraphColl (words collocation) ➡️

8 of 33

2.1 Workshop Overview

  • Creating and inserting a new project and importing 3 corpus folders.
    • Creating self-designed corpora (Taylor Swift, Billie Eilish, Beyoncé)
      • A corpus is a collection of texts.
      • "Corpora" is just the plural form of "corpus."
    • Using text file for each corpus
      • Actual text—no fonts, colors, margins, or hidden codes.
      • .Txt files are lightweight and easy for Lancsbox to process.
      • .Txt can be opened on any computer, with any text editor.
    • Inserting corpus into Lancsbox

9 of 33

2.2 Workshop Overview - T

  • Exploring activism/awareness language in music
  • Understanding music as a place for rebellion against power systems and as a safe space for the audience
  • Identifying the main themes and recurring patterns by analyzing frequency and keywords

10 of 33

2.3 Workshop Overview - Xianzhong

  • Lexical diversity: to understand how vocabulary variety works.
  • Learn to use corpus tools like LancsBox to quantify language features
  • Explore Moving Average Type-Token-Ration (MATTR) and Type-Token-Ration in later sessions (TTR).

11 of 33

3. T’s Session (0–30 min)

12 of 33

Building the Corpus

  • Beyoncé: 189,871 tokens�
  • Taylor Swift: 247,682 tokens�
  • Billie Eilish: 64,231 tokens�

Kaggle is a data science company, whose website has datasets you can input into softwares like Lancsbox

Purpose: To trace how pop lyrics express awareness, resistance, and/or emotional refuge�

Method: keyword frequency, collocation, and topic modeling

13 of 33

Finding Keywords

14 of 33

15 of 33

16 of 33

17 of 33

18 of 33

19 of 33

Beyonce

freedom, black, power, woman

empowerment, spirituality

Taylor Swift

speak, truth, right, voice

nostalgia, agency

Billie Eilish

real, die, world

authenticity, anxiety

Artist Keywords Thematic Focus

Frequencies and Patterns of Activist Language

Conclusion: A keyword analysis reveals consistent rhetorical patterns: identity, resistance, and emotional truth-telling appear across all artists but differ in tone and intensity.

Common thread: All artists turn personal experience into social commentary

Key takeaways: Activism in lyrics isn’t always overtly political; sometimes it’s emotional or identity-driven. Beyoncé embodies collective liberation. Swift reframes gendered power. Eilish critiques authenticity and mental health stigma.

20 of 33

Music as Rebellion & Safe Space

Rebellion against power systems:

  • Beyoncé → racial & gender resistance (Formation, Black Parade)�
  • Taylor Swift → media & patriarchy critique (The Man, Miss Americana & the Heartbreak Prince)�
  • Billie Eilish → defying industry control (Therefore I Am, Your Power)

Music as a safe space:

  • Emotional honesty as audience refuge
  • Community in shared struggle
  • Each artist offers representation as safety — turning visibility itself into a form of care and resistance.

21 of 33

Intersecting Voices of Resistance

  • Beyoncé: collective liberation and Black feminist rhetoric�
  • Taylor Swift: personal autonomy within fame’s surveillance�
  • Billie Eilish: affective rebellion and self-authenticity�
  • Pop as rhetorical site → merging private emotion and public protest�

Comparative Insights

Key takeaway: These three voices show generational and cultural differences in expressing resistance:

Beyoncé externalizes activism

Swift negotiates identity in the public eye

Eilish internalizes rebellion

22 of 33

Conclusion & Possible Future Work

Music as a Rhetoric of Empowerment

  • Lyrics construct shared awareness and self-empowerment�
  • Pop as both art and activism — personal feeling becomes political language�
  • Future research:
    • Extend to other genres (hip-hop, indie)
    • Analyze listener discourse on social media
    • Compare activism tone before and after 2020
  • While pop music has long been dismissed as escapism, this corpus suggests that emotion and activism coexist in sound. Through voice, repetition, and affect, the artists use pop as a medium of rebellion, ultimately transforming vulnerability into visibility and sound into political presence.

23 of 33

4. Xianzhong’s Session (30 min)

24 of 33

4.1 Exploring lexical diversity

  • My research focus is lexical diversity of corpora: (Taylor Swift, Billie Eilish, Beyoncé).
  • Research question:
    • How does lexical diversity vary across the three lyric corpora?

      • TTR (Type-Token Ratio)
      • MATTR (Moving-Average Type-Token Ratio)—measures how varied the vocabulary is

  • Token (the number of whole words)
    • Love is love. (Type: 2, token: 3, TTR: ⅔)

25 of 33

4.1 Exploring lexical diversity

Song

Lyrics Sample

How many tokens

How many types

TTR

Repetitive lyrics

“Love, love, love, love, love, love, love, love, love…”

Even though it’s short, the same word repeats many times—so token count is high, but lexical diversity is low.

Varied sentence

“Love begins softly, grows stronger, and never ends.”

Fewer repetitions and more unique words—total tokens are lower but lexical diversity is higher.

9

1

1/9

8

8

8/8

26 of 33

4.1 Exploring lexical diversity

    • The average MATTR (moving average type-token ratio) default capture window is 50 for Lancsbox.

For example: “Love is freedom, and freedom is love forever and ever.”

Number

Word size (5)

Types

Tokens

TTR

1

Love is freedom and freedom

2

is freedom and freedom is

3

freedom and freedom is love

4

and freedom is love forever

5

freedom is love forever and

6

is love forever and ever

Moving average type-token ratio: MATTR5​=(0.8+0.6+0.8+1.0+1.0+1.0)÷6=0.8667

4

5

⅘=0.8

3

5

⅗=0.6

4

5

⅘=0.8

5

5

5/5=1

5

5

5/5=1

5

5

5/5=1

27 of 33

4.1 Exploring lexical diversity

    • Three types of MATTR (MATTR < 0.6; 0.6 < MATTR < 0.8; 0.8 < MATTR< 1)

Differences between TTR and MATTR

Feature

TTR (Type-Token-Ratio)

MATTR (Moving-Average TTR)

How it’s calculated

Unique words ÷ total words

Average TTR across multiple 50-word windows

Effect of text length

Changes a lot when text is long or short (unstable)

Stays consistent even when texts differ in length (more reliable)

What it shows

A basic snapshot of vocabulary variety

A smoother, more accurate measure of lexical diversity

28 of 33

4.2 Exploring lexical diversity

Measures

Meaning

Lyrics

Tokens

Total number of words in the text

Song length—how much language is used overall

MATTR� (Moving-Average TTR)

Average lexical diversity over 50-word windows

Shows how varied or repetitive the vocabulary is, adjusted for text length

29 of 33

4.2 Exploring lexical diversity

Corpus

Sont Content

Tokens

MATTR (%)

Beyoncé

406

189,871

24.35

30 of 33

4.2 Exploring lexical diversity

Corpus

Sont Content

Tokens

MATTR(%)

Taylor Swift

479

247,682

19.42

31 of 33

4.2 Exploring lexical diversity

Corpus

Song Content

Tokens

MATTR (%)

Billie Eilish

145

64,231

37.18

32 of 33

Corpus

Sont Content

Tokens

MATTR (%)

Beyoncé

406

189,871

24.35 (<0.6)

Repetitive rhythmic lyrics; moderate diversity typical of R&B and pop hooks.

Taylor Swift

479

247,682

19.42 (<0.6)

Extensive narrative songwriting with thematic repetition.

Billie Eilish

145

64,231

37.18 (<0.6)

Highest lexical diversity; introspective and experimental writing.

    • (MATTR < 0.6; 0.6 < MATTR < 0.8; 0.8 < MATTR< 1)

33 of 33

THE END

THANKS