1 of 46

Visual Search as a Cloud Service by Large-Scale Commodity GPU Adoption

Ashwin Nanjappa

Visenze

2 of 46

Outline

  • Who we are
  • What we do
  • How we do it (using GPUs)

3 of 46

Outline

  • Who we are
  • What we do
  • How we do it (using GPUs)

4 of 46

Who we are

  • Mission: simplify the visual web
  • 3 solutions
    • Visual search
    • Visual recognition
    • Search and recognition for videos

5 of 46

Our background

  • Spin-off from NExT Research Centre, National University of Singapore, Singapore
  • Closed $14M funding so far
  • Staff: 45 (10 PhDs)
  • Customers: Major ecommerce sites in SE Asia, Japan, India, UK, USA

6 of 46

  • Who we are
  • What we do
  • How we do it (using GPUs)

7 of 46

What we do

  • Visual Search as a Service and applications
  • Image/video recognition as a Service and applications

8 of 46

What we do

  • Visual Search as a Service and applications
  • Image/video recognition as a Service and applications

9 of 46

One-click visual search experience

object detection + visual search

10 of 46

60+% faster than text search

ViSenze visual search API helps people find visual information more easily

Lazada

Goodrich

Patsnap

11 of 46

ViDiscovery/ViSearch App

  • Visual discovery of everything
  • Only app that does both product search and image recognition
  • Supported by huge structured/unstructured database

12 of 46

13 of 46

14 of 46

What we do

  • Visual Search as a Service and applications
  • Image/video recognition as a Service and applications

15 of 46

- 75% manual effort

ViSenze’s auto tagging helps sites tag their products more efficiently

16 of 46

ViSenze’s auto tagging solution helps sites tag their products more efficiently

17 of 46

Deep structured visual taxonomy for specific verticals

18 of 46

Deep structured visual taxonomy for specific verticals

19 of 46

Video Recognition API service

20 of 46

Application: Video + shopping user experience

21 of 46

  • Who we are
  • What we do
  • How we do it (using GPUs)

22 of 46

Technologies

Computer vision and Deep learning

  • CNN
    • Classification
    • Detection
    • Search
  • Large-scale data crawling

Distributed web service development

  • Java, Golang, Scala, C++
  • Docker
  • Vagrant
  • Zookeeper
  • Apache Thrift

23 of 46

Visual search infrastructure

  • Training pipeline: Train and update models for online online systems
  • Indexing pipeline: Accept image feed and extract visual features and build index
  • Search pipeline: Search the image within index in real time

24 of 46

Visual search infrastructure

  • Training pipeline: Train and update models for online online systems
  • Indexing pipeline: Accept image feed and extract visual features and build index
  • Search pipeline: Search the image within index in real time

25 of 46

Visual search infrastructure

  • Training pipeline: Train and update models for online online systems
  • Indexing pipeline: Accept image feed and extract visual features and build index
  • Search pipeline: Search the image within index in real time

26 of 46

Visual search infrastructure

  • Training pipeline: Train and update models for online online systems
  • Indexing pipeline: Accept image feed and extract visual features and build index
  • Search pipeline: Search the image within index in real time

27 of 46

Offline training

  • Fine tuning CNN
  • Train model from scratch

28 of 46

Offline training: Siamese network

A loss function more suitable for visual search

29 of 46

Offline training infrastructure

  • Servers with Pascal GPUs (TitanX)
    • Past: 980Ti, Titan
  • Xeon CPUs
  • Customized Caffe
  • Training pipeline software (Python)
  • Inference on CPU and GPU
  • Visualization and debugging
  • Evaluation and experiment system (online)

30 of 46

Experiment as a service

31 of 46

Experiment infrastructure

  • Pascal GPU servers
  • Custom Caffe
  • Jenkins for job scheduling
  • Minio for data store
  • Custom web service for visualization

32 of 46

Online search with feature and recognition

  • Visual recognition: trained CNN model
  • Visual feature: intermediate level activation (e.g. fc6 from Alexnet)

33 of 46

Online infrastructure

34 of 46

Online infrastructure

Search

Index

35 of 46

Online infrastructure: indexing

API load balancers, index queues, customer queues

36 of 46

Online infrastructure: indexing

AWS Cx instances, 200 max, throughput: 1M images/hour

37 of 46

Online infrastructure: indexing

Features on S3, hashes loaded into memory on region servers

38 of 46

Online infrastructure: search

Load balancers and API servers for low latency

39 of 46

Online infrastructure: search

AWS G2 instances to detect and extract features

40 of 46

Online infrastructure: search

Distance search using hash and features

41 of 46

Online infrastructure: search

Latency: <400ms, Bottleneck: Getting the image!

42 of 46

Amazon GPU instances

43 of 46

Amazon GPU instances

Not available in all regions (US only)

44 of 46

Hybrid infrastructure

Everything on cloud GPU: not there yet

45 of 46

GPUs @ Visenze

  • Economical for training and evaluation
    • Can be easily deployed to Amazon GPU instances later
  • Easy to program
    • Caffe, Tensorflow
  • Scalable across GPU generations
  • Fast and powerful
    • 5-10x faster compared to CPU
    • 100s of training, experiments every day/week!
  • AWS GPU perf per $ still a bit higher than CPU instances
    • Indexing: CPU
    • Search: GPU

46 of 46

Thank you!