1 of 15

AN ARTS PERSPECTIVE

ON MULTIMODAL AI

Janina Wildfeuer,

Communication & Information Studies

Research Group Multimodality (and AI)

2 of 15

Human society is challenged to have meaningful interactions with AI systems and to find responsible ways of reproducing communication �and knowledge by and with machines.

3 of 15

MULTIMODALITY

language is inherently multimodal

communication is inherently multimodal

human meanings are inherently multimodal

“Generative AI anachronistically �limits itself to the discursive �sequencing of written text”

Cope & Kalantzis 2023: 14�

4 of 15

Midjourney 4: “human hand”

DALL-E 2: “CPR instruction in table style, �black and white, text and image”

AI FILM -The Carnival of the Ages - Runway gen2

5 of 15

TEXT-TO-IMAGE

“translate ideas into exceptionally accurate images” � (DALL·E 3)

fake image of Pope Francis wearing a puffer jacket (2023)

anatomically incorrect diagram of a rat's penis and testicles published in Frontiers (2024)

a new medium of thought to “expand the imaginative power of the human species” (Midjourney)

6 of 15

TEXT-TO-VIDEO

“new standard for immersive AI content” (Meta MovieGen)

Open AI Sora

“trim down and create seamless repeating videos with Loop” and “combine two videos into one seamless clip” (Sora)

MetaMovieGen

7 of 15

MULTIMODALITY

language is inherently multimodal

communication is inherently multimodal

human meanings are inherently multimodal

“Generative AI anachronistically �limits itself to the discursive �sequencing of written text”

Cope & Kalantzis 2023: 14�

“current disconnect between technical approaches to cross-cultural multi-

modal learning and the rich, interdisciplinary traditions of visual cultural studies”

Yadav et al. 2015: 9

Cope & Kalantzis 2023: 14�

8 of 15

A LITTLE EXPERIMENT

OBLIQUE STRATEGIES

Oblique Strategies (subtitled Over One Hundred Worthwhile Dilemmas) is a card-based method for promoting creativity jointly created by musician/artist Brian Eno and multimedia artist Peter Schmidt, first published in 1975.

a deck of 7-by-9-centimetre (2.8 in × 3.5 in) printed cards in a black box.
each card offers a challenging constraint intended to help artists break creative blocks by encouraging lateral thinking.

9 of 15

A LITTLE EXPERIMENT

Some free, no log-in required �image/video generators:

perchance vondy� craiyon giz.ai � web.vidon.ai�

ChatGPT/Midjourney?

Go to obliquestrategies.ca and select one card
Use an image generator to visualize the message from the card
Discuss the relationship between the verbal “prompt” and the visual output
Post the selected image to the Padlet → detailing the prompt and the generator used

https://padlet.com/jwildfeuer/MMAI

10 of 15

“Generative AI [...] has no idea, architecture, theory or mechanism of meaning, nor any way to take into account the human interest to mean.”

Cope & Kalantzis 2023: 3

“Cultural meaning is not only symbolic — it is also embedded in physical forms, spatial arrangements, and production contexts.”

“...understanding the cultural and relational meanings of material artifacts—images included—requires attending to their social and historical trajectories, circulation, and entanglement in person-object relations”

Yadav et al. 2025

11 of 15

AI Materiality �→ Computer Science

(Audio-)Visual Artefacts �→ Multimodality Research

Hiippala, T. (2021). Distant viewing and multimodality theory: Prospects and challenges. Multimodality & Society, 1(2), 134-152. https://doi.org/10.1177/26349795211007094

modes = sensory modalities�
meanings can be derived in a straightforward manner through, e.g. visual perception

humans combine different forms of communication to make and exchange meanings�
any percepts involving communication must be related to particular expressive forms in order to bridge perception and interpretation

12 of 15

AI Materiality �→ Computer Science

individual expressive forms
limited choice: constrained database/library
independent design decisions on small levels: use of filters, lighting, movement

(Audio-)Visual Artefacts �→ Humanities more broadly

Filmic and visual design choices have semiotic potential
Films and visuals as communicative practices, as social phenomena anchored into the social interaction of individuals in social contexts

13 of 15

“THE HUMANITIES”

expertise on the meaningful interplay �& integration of multiple expressive forms

extensive, systematic knowledge about specific expressive forms: �‘grammar’, structures, styles, genres, etc.

careful curation and classification of multimodal data

systematic approaches to context/discourse knowledge

14 of 15

“THE HUMANITIES”

complementing and enhancing work with AI tools

modelling a more holistic understanding and representation of the world

15 of 15

THANK YOU

Research Group Multimodality (and AI)

Contact:

Janina Wildfeuer, ��Communication & Information Studies

j.wildfeuer@rug.nl

References:

Bateman, J., Wildfeuer, J., & Hiippala, T. (2017). Multimodality. Foundations, Research, and Analysis. A problem-oriented introduction. De Gruyter.

Cope, B., & Kalantzis, M. (2023). A multimodal grammar of artificial intelligence: Measuring the gains and losses in generative AI. Multimodality & Society, https://doi.org/10.1177/26349795231221699

Hiippala, T. (2021). Distant viewing and multimodality theory: Prospects and challenges. Multimodality & Society, 1(2), 134-152. https://doi.org/10.1177/26349795211007094

Yadav, S., et al. (2025). Cultural Evaluations of Vision-Language Models Have a Lot to Learn from Cultural Theory. https://arxiv.org/abs/2505.22793

�