1 of 32

CSCI 3280

Introduction to Multimedia Systems

(2026 Term 2)

Computer Science & Engineering

The Chinese University of Hong Kong

2 of 32

Announcement

  • The score of the second assignment is released (contact me if you have any questions).

  • The APIs information for the course project is released.

  • The last tutorial is scheduled next week.

3 of 32

Multimedia Fusion: Motivation

  • Why do we need to combine different media?

4 of 32

Multimedia Fusion Application

  • Stock prediction

5 of 32

Multimedia Fusion Application

  • Health risk prediction

6 of 32

Multimedia Fusion Application

  • Self driving car

7 of 32

Multimedia Fusion (1)

  • Different fusion methods

8 of 32

Multimedia Fusion (2)

  • An example of early fusion: Multi-modal Sensor Fusion

9 of 32

Multimedia Fusion (2)

  • An example of late fusion: speech to text

10 of 32

Multimedia Fusion (3)

  • Joint fusion: aggregating the features in late stage

11 of 32

Multimedia Fusion (4)

  • An example of joint fusion

12 of 32

Multimedia Fusion (5)

  • An example of joint fusion: stock price prediction

13 of 32

Multimedia Learning: Motivation

  • Why do we need to jointly learn multimedia data?

  1. early or joint fusion needs multimedia learning
  2. many applications utilize multimedia learning

14 of 32

Multimedia Learning: Image Captioning

15 of 32

Multimedia Learning: Image Captioning

16 of 32

Multimedia Learning: Image Captioning

17 of 32

Multimedia Learning: Visual Question Answering

18 of 32

Multimedia Learning: Visual Question Answering

19 of 32

Multimedia Learning: Visual Question Answering

20 of 32

Multimedia Learning: Attention

21 of 32

Multimedia Learning: Attention

22 of 32

Multimedia Learning: Attention

23 of 32

Multimedia Learning: Attention Visualization

24 of 32

Multimedia Learning: Attention Score

25 of 32

Multimedia Learning: Attention Score

26 of 32

Attention - Speech + Text

27 of 32

Multimedia System Design

28 of 32

Multimedia System Design

  • Example: how to design a system to detect suspicious events in an airport (you can use camera video)?

29 of 32

Multimedia System Design

  • Example: how to design a travel planning chatbot?

30 of 32

Multimedia System Design

  • Design a AI system for a specific task

– Workflow (input/output, main steps etc.)

– Data resource (how many media data you will use)

– Modeling method (deep learning or non-deep learning, what kind of models etc.)

– Pseudocode

  • Example: how to design a system to detect driver fatigue driving (you can use camera, sensors etc.)?

31 of 32

Multimedia System Design

  • Design a AI system for a specific task

– Workflow (input/output, main steps etc.): input:image/sensor data, output: detection warning; Main steps: data processing - modeling - late fusion - detection results

– Data resource (how many media data you will use): image/video data, sensor data (distance, position)

32 of 32

Multimedia System Design

  • Design a AI system for a specific task

– Modeling method: image-CNN, video-CNN+RNN, sensor-RNN

– Pseudocode