Captioning Image to Assist People Who are Blind
�Wuao Liu, Sihang Wei, Hao-Hsiang Hsu�EECS 442 Final Project�
1
Motivation
Current Methods of Image Captioning
Baseline Method (CNN + RNN/LSTM)
Attention Methodology
Cell t
Cell t+1
Feature Map
Attention Mechanism
Recurrent Networks
6
…
[Slide derived from Chris Olah: https://colah.github.io/posts/2015-08-Understanding-LSTMs/]
Results
We reproduce the
Algorithm mentioned
In the paper, and
Prove that attention
Mechanism can
Improve the model
performance
| Dataset | Model | BLEU1 | BLEU2 | BLEU3 | BLEU4 |
Paper result | COCO | Soft-Att | 70.7 | 49.2 | 34.4 | 24.3 |
Ours | COCO | Soft-Att | 60.4 | 19.6 | 9.6 | 7.0 |
| VizWiz | CNN+ RNN | 57.9 | 17.8 | 3.1 | 1.4 |
| VizWiz | Soft-Att | 59.2 | 20.3 | 6.5 | 5.1 |
Contributions
Green words: Open source code online
Black words: Our implementation
Future Work
attention model
BLEU score with less training epoch�
a hand holding a bottle of something that looks like a blurry white surface.
A person is cutting a cake with a knife.
More Demos
a hand holding a cd card with the words “ <unk> ” near it .
the left corner of a keyboard showing the letters `` <unk> '' , `` keys '' , the numbers `` , '' `` <unk> '' above the letters ``
quality issues are too severe to recognize visual content , it 's too blurry to read , what looks like a white and black object that has a metal trim
someone is holding a white plastic container with a blue lid , and the lid is sitting on a tan surface.
More Demos
A group of people standing around a truck.
A man sitting at a table with a laptop.
A couple of people walking along a beach next to the ocean.
Thanks for listening !