AN ARTS PERSPECTIVE
ON MULTIMODAL AI
Janina Wildfeuer,
Communication & Information Studies
Research Group Multimodality (and AI)
Human society is challenged to have meaningful interactions with AI systems and to find responsible ways of reproducing communication �and knowledge by and with machines.
MULTIMODALITY
AI
language is inherently multimodal
communication is inherently multimodal
human meanings are inherently multimodal
“Generative AI anachronistically �limits itself to the discursive �sequencing of written text”
Cope & Kalantzis 2023: 14�
Midjourney 4: “human hand”
DALL-E 2: “CPR instruction in table style, �black and white, text and image”
AI FILM -The Carnival of the Ages - Runway gen2
TEXT-TO-IMAGE
“translate ideas into exceptionally accurate images” � (DALL·E 3)
fake image of Pope Francis wearing a puffer jacket (2023)
anatomically incorrect diagram of a rat's penis and testicles published in Frontiers (2024)
a new medium of thought to “expand the imaginative power of the human species” (Midjourney)
TEXT-TO-VIDEO
“new standard for immersive AI content” (Meta MovieGen)
Open AI Sora
“trim down and create seamless repeating videos with Loop” and “combine two videos into one seamless clip” (Sora)
MetaMovieGen
MULTIMODALITY
AI
language is inherently multimodal
communication is inherently multimodal
human meanings are inherently multimodal
“Generative AI anachronistically �limits itself to the discursive �sequencing of written text”
Cope & Kalantzis 2023: 14�
&
“current disconnect between technical approaches to cross-cultural multi-
modal learning and the rich, interdisciplinary traditions of visual cultural studies”
Yadav et al. 2015: 9
Cope & Kalantzis 2023: 14�
A LITTLE EXPERIMENT
OBLIQUE STRATEGIES
Oblique Strategies (subtitled Over One Hundred Worthwhile Dilemmas) is a card-based method for promoting creativity jointly created by musician/artist Brian Eno and multimedia artist Peter Schmidt, first published in 1975.
A LITTLE EXPERIMENT
Some free, no log-in required �image/video generators:
perchance vondy� craiyon giz.ai � web.vidon.ai�
ChatGPT/Midjourney?
https://padlet.com/jwildfeuer/MMAI
“Generative AI [...] has no idea, architecture, theory or mechanism of meaning, nor any way to take into account the human interest to mean.”
Cope & Kalantzis 2023: 3
“Cultural meaning is not only symbolic — it is also embedded in physical forms, spatial arrangements, and production contexts.”
“...understanding the cultural and relational meanings of material artifacts—images included—requires attending to their social and historical trajectories, circulation, and entanglement in person-object relations”
Yadav et al. 2025
AI Materiality �→ Computer Science
(Audio-)Visual Artefacts �→ Multimodality Research
Hiippala, T. (2021). Distant viewing and multimodality theory: Prospects and challenges. Multimodality & Society, 1(2), 134-152. https://doi.org/10.1177/26349795211007094
AI Materiality �→ Computer Science
(Audio-)Visual Artefacts �→ Humanities more broadly
“THE HUMANITIES”
expertise on the meaningful interplay �& integration of multiple expressive forms
extensive, systematic knowledge about specific expressive forms: �‘grammar’, structures, styles, genres, etc.
careful curation and classification of multimodal data
systematic approaches to context/discourse knowledge
“THE HUMANITIES”
AI
&
complementing and enhancing work with AI tools
modelling a more holistic understanding and representation of the world
THANK YOU
Research Group Multimodality (and AI)
Contact:
Janina Wildfeuer, ��Communication & Information Studies
References:
Bateman, J., Wildfeuer, J., & Hiippala, T. (2017). Multimodality. Foundations, Research, and Analysis. A problem-oriented introduction. De Gruyter.
Cope, B., & Kalantzis, M. (2023). A multimodal grammar of artificial intelligence: Measuring the gains and losses in generative AI. Multimodality & Society, https://doi.org/10.1177/26349795231221699
Hiippala, T. (2021). Distant viewing and multimodality theory: Prospects and challenges. Multimodality & Society, 1(2), 134-152. https://doi.org/10.1177/26349795211007094
Yadav, S., et al. (2025). Cultural Evaluations of Vision-Language Models Have a Lot to Learn from Cultural Theory. https://arxiv.org/abs/2505.22793
�