1 of 4

SemStyle: Learning to Generate Stylised Image Captions using Unaligned Text

Alexander Mathews *†, Lexing Xie *†, Xuming He

Australian National University *, Data to Decision CRC †, ShanghaiTech University ‡

  • Describe the images with an identifiable style
  • Learn style from a separate text corpus
  • Generate different styles from the same model

Goals:

I stopped short when I saw the train sitting at the station.

A train that stopped at a train station.

Style: story-like

Style: descriptive / MSCOCO

Poster: [D3]

2 of 4

Poster: [D3]

a white sheep and birds in a field.

a clock is mounted on the corner of a building.

Images with descriptive captions

She wore a gown the colors of an autumn sunset.

She sat down and picked up her fork.

He cleared his throat and when that got no response, he banged his fist down on the table.

She smiled sheepishly.

……

Text corpus in a distinct style

3 of 4

Evaluated automatically and manually:

  • Comparable relevance to descriptive baselines.
  • More ‘story-like’ than descriptive baselines.

Poster: [D3]

A woman walking with an umbrella in the rain.

The woman stepped underneath her umbrella and walked in the rain.

A juicer is poured into a

glass of juice.

I'll be in the juicer with a

glass of orange juice.

A forest that has a large tree in it.

Forest, tall, and thick trees.

(a)

(b)

(c)

Success cases

Failure case

[Descriptive]

[Story-like]

4 of 4

See Paper #4057 or Poster [D3] for details of:

Code, models and more results:

  • How to construct the term space?
  • How to train both parts of the model?
  • How to generate multiple styles?
  • How to evaluate style automatically?

Poster: [D3]

I had to take a little umbrella to the beach.

Tennis player got balls.