1 of 83

CrowdSearcher

Framework

Alessandro Bozzon, Andrea Mauri, Chiara Pasini,

Luca Tettamanti, Riccardo Volonterio

http://crowdsearcher.search-computing.org

Milan, October 22nd 2012

2 of 83

Introduction

3 of 83

CrowdSearcher

A configurable, programmable crowd-management system

Crowds and social network communities as first-class sources

Abstractions for

Task
Evaluation items
Crowd-selection strategies
Evaluation aggregations
....

4 of 83

Task

A crowdsourced data manipulation/analysis activity, typically focused on a single action (although several concurrent actions are allowed) performed on coherent set of Objects

5 of 83

Task: Examples

Recognize and identify the people contained in a set of image

Input Objects: images
Output Objects: images + bounding boxes + names

Annotate the named entities contained in a book

Input Objects: text organized in pages
Output Objects: set of named entities

Crop the silhouette of the models in a set of images

Input Objects: images
Output Objects: images + polylines

Create a complete list of the restaurants nearby Politecnico

Input Objects: none
Output Objects: set of restaurant names

Evaluate the courses offered in Politecnico

Input Objects: set of course names
Output Objects: course names + vote

6 of 83

Performer

A human being involved in the execution of a Task

Example

Students of the Search Computing Course
My Facebook friends
Expo 2015 Attendees
Javascript experts on StackOverflow

7 of 83

MicroTask

An instance of a Task, operating on a subset of its input objects, and assigned to one or more performers for execution.

Example

Locate and identify the faces of the people appearing in the following 5 images
Order the following courses according to your preferences

8 of 83

Roles

Crowd Developer: The developer of a crowd application with CrowdSearcher

Task Creator: The creator of a Task

Might be human (e.g. the Crowd Developer) or an External Application

Performer: The executor of a MicroTask

9 of 83

Use Case: Performer

10 of 83

Use Case: Task Creator

11 of 83

Use Case: Crowd Developer

12 of 83

Architecture

13 of 83

CrowdSearcher Architecture

14 of 83

CrowdSearcher Configurator

A user interfaces and a set of Web APIs for
Application Creation and Configuration
Task Configuration
Object provision
Task Execution
Task Monitoring
Performer Subscription

Interacts with Social Networks (e.g. Facebook, Twitter) to find and engage performers

Interacts with CrowdSourcing platforms to exploit their micro-task execution and marketplace infrastructures

Stores information about Tasks, MicroTasks, and their execution

15 of 83

Task Execution Framework

Default User Interfaces for MicroTask execution

Javascript APIs for custom MicroTask execution interfaces

16 of 83

External Application

Custom CrowdSourcing application

E.g. Game With a Purpose

Uses a set of JAVA APIs that simplify the interaction with the Web APIs of CrowdSourcer

17 of 83

Task Configuration

18 of 83

Computation: the process of mapping an input to an output

CrowdSearcher

Input Objects

Output Objects

Task Configuration

19 of 83

Task Design (1)

Which are the input objects of the crowd interaction?

Unstructured text, structured data, images

Which operations the crowd should perform?

Add new instances, verify/modify data, order, etc.

How the results of the crowd operations should be aggregated?

Sum, Average, Majority voting, etc.

Which execution interface the crowd should use for task execution?

20 of 83

Task Design (2)

Which splitting strategy do you want for your objects?

Number of MicroTasks, Number of objects in each MicroTask, etc.

Which performer assignment strategy do you want to apply?

Pre-defined, unknown, in a social network, etc.

Which external applications should be notified about the task advancements? (REACTIVITY)

Are the Task objects defined at Task configuration time, or are they produced in a streamed fashion? (REACTIVITY)

21 of 83

Task Object Design

The input Objects are described by a schema

Set of fields, each defined by

A name

Must be a valid SQL identifier!

A Type:

String
Date
URL
Integer
Boolean
Image (Url)
Video (Url)
BLOB

ID	Image URL	Label

22 of 83

Operations

In a Task, performers are required to execute one logical operation on the input objects

e.g. Locate the faces of the people appearing in the following 5 images

Operations might require operation-dependant configurations

CrowdSearcher offers 6 types of pre-defined operations:

Like
Comment
Tag
Classify
Add
Modify
Order

23 of 83

Aggregation and Output Objects

According to the specified operations, the Task produces output objects having

additional fields
new objects
modified values

Values in additional fields and/or modified values might be calculated according to an operation-specific aggregation function

The answer provided by each performer is stored in the Task Execution Repository and are accessible via dedicated Web and Java APIs

ID	URL	# Likes

ID	URL

Operation + Aggregation

24 of 83

Operations: Like

Ask a performer to express a preference (true/false) for one or more objects

e.g. Which pictures/restaurants/football team do you like the most?

Example aggregation functions:

Number of likes received by each object
Majority like: a float value indicating the % of performers that liked the object
Custom: a custom value containing the result of an application-specific aggregation

25 of 83

Operations: Comment

Ask a performer to write text which describes/summarizes/evaluates an object

e.g. Can you summarize the following text? Using your own words, can you describe the following picture?

The maximum length of the text can be configured

A comment can be used by a client application to contain an application-specific data structure (e.g. a semi-structured document)

Example aggregation functions:

Concatenation: an array of strings where each element in the array contains one comment
Custom: a custom text containing the result of an application-specific aggregation (group of comments by similarity)

26 of 83

Operations: Tag

Ask a performer to annotate an object with a set of tags

e.g. How would you label the following image?

The maximum number of tags can be predefined

Tags can have application-defined types

Integers to express a "vote" to a given object
Structured, low-level features (e.g. coordinates, feature vectors, etc.)

Example aggregation functions:

The frequency of appearance of each tag for the objects
The average vote of the object
The approximated bounding box of a face

27 of 83

Operations: Classify

Ask a performer to classify an object within a closed-set of alternatives

e.g. Which is the category of this restaurant? Would you classify this tweet as politically skewed or not?

The classification categories, and the maximum number of selectable categories for an object must be predefined

Example of aggregation functions:

Number of classifications for each configured category
Average number of categories

28 of 83

Operations: Add

Ask a performer to add a new object conforming to the specified schema

e.g. Can you list the name and address of good restaurants nearby Politecnico di Milano?

At the end of the Task, new objects will be provided as output, possibly in addition to the ones given as input

Example of aggregation functions for new objects:

Distinct values and Frequency

29 of 83

Operations: Modify

Ask a performer to verify the content of one or more input objects, possibly modifying/completing wrong or missing information

e.g. Which pictures/restaurants/football team do you like the most?

Example of aggregation functions for each object:

List of suggested updates
Majority on applied modifications

30 of 83

Operations: Order

Ask a performer to order the provided objects according to some ordering criteria

e.g. Order the following football club according to their popularity; order the following books according to your taste

Performers can express orders in several ways:

Pair-wise orders
Partial orders
Complete orders

Example of aggregation functions:

Position of the object in the final ranking
Number of accordant orders

31 of 83

Splitting and Assignment Strategies

Crowd can provide noisy, biased, possibly spam results

Task Planning

How to retrieve reliable and correct results?

Evaluation redundancy Splitting

The “truth” (objective or cultural) exists, and through redundancy we can find it :-)

Choose the right crowd! Invitation and Assignment

WHTBT: Whoever Happens To Be There
Push Vs. Pull

32 of 83

Task Planning in CrowdSearcher

Task Planning

Performer Invitation Strategy

Performer Assignment Strategy

Object Splitting Strategy

[Objects]

[{[Objects],Performers}]

MicroTask Assignment

[{[Objects]}]

MicroTask

NOTE: in the future, custom Task Planning flows will be specifiable

33 of 83

Splitting Strategy

Given as input N objects

How many MicroTask?
Which objects should appear in MicroTasks?
How many objects in each MicroTask?
How often an object should appear in MicroTasks?
Which objects cannot appear together?
Should objects be presented always in the same order?

34 of 83

Invitation Strategy

Public or private Task?

Public Task: everyone can perform
Private Task: only registered performers

How do you engage people in becoming performers?

Publish a post/tweet on your social network profile
Publish a post/tweet on your friends' profile
Send an email to a mailing list
Publish a HIT on Mechanical Turk
Create a new challenge in your game
...

35 of 83

Assignment Strategy

Given a set of M MicroTasks, which performers are assigned to them?

Online assignment

Potentially, anyone can execute a MicroTask

First come / First Served
MicroTask priority
MicroTask to Performer matching

Offline assignment

MicroTasks are uniquely assigned to Performers

Performer priority
MicroTask to Performer matching

36 of 83

Execution Interface

Task Execution Framework

Default

One default execution interface for each operation type

Custom

Application-specific execution UI

External Application

37 of 83

Constraints and Reactive Behaviour

38 of 83

Evolution of a Crowd Task

A Task evolves in time according to the answers provided by the performers

A Task Designer might want the system to:

Notify a third party application about the updates in the Task Status
Adapt the splitting and assignment strategies by re-planning objects or re-assigning performers
Define constraints on

The task evolution (e.g. maximum time)
The performers (e.g. minimum performer quality)
The evaluated objects (e.g. quality of the evaluation)

Accept new input Objects

39 of 83

Reactive Crowdsourcing

CrowdSearcher

Initial Input Objects

Output Objects

Task Configuration

Input Objects Stream

New Performer

Timer

New Answers

Initialization

RunTime Events

END Objects Stream

40 of 83

Properties of CrowdSearch Objects

Task

StartTime / EndTime
Status:

Initialized: Task created
Planned: MicroTask planned and assigned
Open: in execution
Closed: Task finalized with valid results

Confidence : value in [0,1] indicating the reliability of the result
#PlannedMicroTasks
#InvolvedPerformers
...

41 of 83

Properties of CrowdSearch Objects

Object

Answer

The final evaluation performed by the crowd

Status:

Unplanned: inserted but not planned yet
Open: in evaluation
Closed: evaluated
Invalid: evaluation terminated but not completed

Confidence : value in [0,1] indicating the reliability of the evaluated answer

#AssignedMicroTasks
#InvolvedPerformers
...

42 of 83

Properties of CrowdSearch Objects

Performer

Status:

Available: the performer is ready to be assigned to MicroTasks
Unavailable: the performer is not available for assignments

Confidence : value in [0,1] indicating the reliability of the performer

#AssignedMicroTasks
#EvaluatedObjects
Assignment-acceptance ratio

43 of 83

Events (1)

OnTaskCreation

The creation and configuration of a Task. Planning strategies are not yet applied; therefore, no MicroTasks have been created

OnTaskStart

The event is triggered after the planning strategies have been applied, and the MicroTasks have been created and assigned

OnNewObjects

Event triggered upon the provision of new input Objects to the Task

OnObjectsEOF

Event triggered when the EOF command is sent to CS, commanding that no more input objects will be provided

44 of 83

Events (2)

Temporal

Event triggered according to given temporal rules (e.g. every 5 minutes, at a given time, etc.)

OnNewPerfomer

The event is triggered when a new Performer registers to the platform

OnTaskRequest

This event is triggered when a Performer requests the execution of a Task

45 of 83

Events (3)

OnMicroTaskStart

The event is triggered when a Performer starts the execution of a given MicroTask.

OnMicroTaskEnd

Event triggered upon the reception of the answer to a MicroTask

OnTaskEnd

Event triggered when the status of the Task is set to Closed

46 of 83

Control Rules

Scripts associated to events

Executed upon the triggering of the associated event
An event can be associated to several scripts

Goal:

Inspection of the system status
Changes to the system status

Plan/Assign newly arrived objects
Re-plan/Re-Assign existing objects
Aggregate Answers to calculate final results
Modify Objects status
Modify Performers status

Notification to third party applications

Task Status (e.g. ended)
Object Status (e.g. evaluated)

47 of 83

Examples of Control Rules (1)

Maximum duration of Task is 2 hours

After 2 hours from the task start trigger a script that

Set as closed all the open objects
Set the Task as closed
Notify Task results

Verify that an object has at least 5 valid evaluations, with a majority vote

Every time a MicroTask ends, check all the evaluated objects. For each object:

Verify how many times it has been evaluated
If < 5 evaluations, do nothing
If 5 evaluations, count the number of coherent answers
Aggregate all the answers and set the final Object answer
Set the Object as closed

49 of 83

Introduction

The Web APIs allow to interact with the system using HTTP request.

The APIs are grouped in:

Task manipulation

Creation,configuration,opening,monitoring

MicroTask manipulation
Answers management
User management
Stream management

50 of 83

Introduction(2)

Each API call requires a secretKey that identify your application

To obtain a secret key you have to register your application in CrowdSearcher

The secretKey must be sent with every API call encoded in the URL

i.e: [..]postTask.do?secretKey="your_secret_here"

51 of 83

Task Manipulation

/postTask.do

Creates a task
Input: a JSON containing (example)

Title
Textual description
Schema
List of objects
Strategies and control rule

/getTask.do

Retrieves the description of a task (example)
Input:

The id of the task

52 of 83

Task Manipulation(2)

/getMyTasks.do

Retrieves the list of tasks belonging to your application
Input: none (remember to always include the secretKey)

/openTask.do

Starts a task by executing the planning strategy defined when creating the task
Input: a JSON containing

Id: the id of the task

/endTask.do

Terminates a task
Input: the id of the task

53 of 83

Task Manipulation(3)

/subscribeToTask.do

Subscribes to the feed of a task in order to be notified when some events occur
Input: a JSON containing

id: the id of the task
callback: where the CS will send the answers

54 of 83

MicroTask Manipulation

/postMicroTask.do

Creates manually a micro task
Input: a JSON containing

id of the task
list of objects

/getTaskComposition.do

Retrieves the list of micro task that compose a task
Input:

id of the task

/endMicroTask.do

Terminates a micro task
Input: a JSON containing

id: the id of the micro task

55 of 83

Answers Management

/postAnswers.do

Posts the answers of a given micro task
Input:

a JSON containing (example)

id of the micro task
list of answers (it depends on the task type)

the id of the microTask execution

/getAnswers.do

Retrieves the answers of a given task
Input: the id of the task

56 of 83

User management

/getPerformers.do

Retrieves the list of possible performers (social network profiles)
Input: none (always remember the secret key)

/getUser.do

Retrieves the details of a user
Input:

token of the user

57 of 83

Stream Management

/addObjects.do

Adds one or more objects to a task
Input: a JSON containing

id of the task
list of objects

/sendEof.do

Sends the EOF signal to the task, meaning that no more objects will be sent
Input: a JSON containing

id of the task

58 of 83

Task Execution Framework Javascript APIs

59 of 83

Introduction

Used to code the interface of the MicroTask execution

It allows the retrieval of (Micro)Task information

Gather Task description
Retrieve MicroTask input objects
Retrieve Task configurations

It allows the upload of the MicroTask execution results

Post uTask results

Interactions with custom Web APIs or third-party applications is allowed

60 of 83

Initialization

Include the uTask.js into the html

<script src="path/uTask.js"></script>

Subclass the uTask "Class"

SubClass.prototype = new uTask(); [Javascript]

Override the init method

init: invoked during initialization.

Create the custom implementation.
Enjoy!!

61 of 83

Task Details

getDetails( callback )

Gets all the the details for the current uTask.

Wrapper for the getTask.do CrowdSearcher Web APis

62 of 83

Task input objects

getData( field, callback )

Gets the input object for the current MicroTask

field specifies which columns should be retrieved

Example

getData( function( err, data ) {

// Data will contains all the input objects

})

63 of 83

Task configuration

getConfiguration( name, callback )

Retrieves the configurations values

name is the configuration key
If name is missing return all the configurations

Example (classify task)

getConfiguration( 'category', function( err, data ) {

// data contains the list of the categories

} )

64 of 83

Post MicroTask results

Execution Logic is delegated to the specific Task Type implementation.

outputData

An in-memory object containing the data to send back to crowd-searcher

General functions for output management.

toggleData(name): toggles name in the outputData
removeData(name): removes name from outputData
storeData(name, value): store value in outputData
postData(callback): post the contents of outputData to the Task endpoint

66 of 83

Introduction

Java API closely matches the Web protocol

Not very OO, but the round trips required to set up a task are minimized

Standalone JAR

Also available as OSGi service

JavaDoc is available

Please do RTFM

Examples provided

67 of 83

Initialization

Use the factory to obtain a new API object

Configuration:

Protocol, host and port of the CS
Base URL of the CrowdSearcher
Application key

CrowdSearcher cs = CrowdSearcherFactory.newInstance(

"http://localhost:8080", "/cs_prototype_v2",

"s3cr3t");

68 of 83

Task creation

Title and question text
Public or private?
Input schema
Input objects

Optional: objects can be added after creation

Configuration data

Dependent on task type

Task task = new Task(TaskType.LIKE, "http://cs.net",

"Title", "question text", false,

"myschema", schema, inputObjects, null);

cs.addTask(task)

69 of 83

Task Configuration (1)

Splitting strategy: defines how many MicroTasks are created and how input objects are splitted among them

MANUAL, EQUI_SPLIT, REDUNDANCY, CUSTOM

// Split into buckets with at most 10 input objects

PlanningStrategy strategy =

new EquiSplitPlanningStrategy(taskId,10);

cs.postTaskPlanningStrategy(strategy);

70 of 83

Task Configuration (2)

Assignment strategy: defines "who" does "what"

Which MicroTask is assigned to which performer

Static (manual, custom)

Assignment is decided at task creation time

Dynamic (round robin, external, custom)

Assignment is performed at run time

PerformerAssignStrategy assignmentStrategy =

new PerformerAssignStrategy(

taskId, AssignmentType.DYNAMIC_ROUNDROBIN);

cs.postPerformerAssignStrategy(assignmentStrategy)

71 of 83

Task Configuration (3)

Invitation strategy: defines how to "invite" performers to execute e task

CUSTOM, MANUAL, ANNOUNCEMENT, ALL_FRIEND, RANDOM_FRIEND
Invitation platform (Facebook, twitter, g+, etc)
Strategy-dependant configuration

final List<SocialNetwork> sn = new ArrayList<SocialNetwork>();

sn.add(SocialNetwork.FACEBOOK);

sn.add(SocialNetwork.TWITTER);

Configuration config = new Configuration(3, null, sn);

final InvitationStrategy invitationStrategy = new

InvitationStrategy(taskId,InvitationName.RANDOM_FRIEND, config);

cs.postInvitationStrategy(invitationStrategy);

72 of 83

Task Configuration (4)

Task Implementation: defines how a MicroTask is executed

DEFAULT: built-in interface of CS (deprecated, for testing only)
TEF: use the Task Execution Framework
CUSTOM: use an external application

TaskImplementation implementation = new

TaskImplementation(taskId,ImplementationType.TEF,

new TaskImplementation.Configuration(

"http://tef.example.net", "tefConfigName"));

cs.postTaskImplementation(implementation);

73 of 83

Evolution of a Task (1)

Control rules: "what" the CS has to do "when" something happens.

"when": ON_TASK_END, ON_ADD_TASK, PERIODIC, ON_ADD_OBJ, ON_EOF, etc.
"what": name of the Groovy script to execute

ControlStrategy controlStrategy = new

ControlStrategy(taskId, ControlEvent.PERIODIC,

"myControlScript", "@hourly");

cs.postControlRule(controlStrategy);

74 of 83

Evolution of a Task (2)

Emission Strategy: "how" and "when" emit results

when: ON_TASK_END, ON_ADD_TASK, PERIODIC, ON_ADD_OBJ, ON_EOF, etc.
how: DEFAULT or CUSTOM emission format

final EmissionStrategy es = new EmissionStrategy(

taskId,EmissionStrategyType.PERIODIC,"@hourly",

EmissionFormat.CUSTOM,"myFormatterScript");

cs.postTaskEmissionStrategy(emissionStrategy);

75 of 83

Evolution of a Task (3)

Add a subscription: to receive all the task emissions

String url =

"http://localhost:8085/testerCS/CallbackTest";

Subscriber subscriber = new Subscriber(taskId,url);

cs.subscribeToTask(subscriber);

76 of 83

Evolution of a Task (4)

Start a task

starts the execution of the Planning strategy to

split, assign, and send invitation.

TaskManupulation tm = new

TaskManupulation(taskId);

cs.publishTask(tm);

77 of 83

Evolution of a Task (5)

Stream management:

add a new set of objects to the task
notify an EOF for the input stream

List<CrowdObject> objects = new ArrayList<CrowdObject>();

//...populate objects...

cs.addObjectsToTask(task.getId(), objects);

final TaskManupulation taskManipulation = new TaskManupulation(taskId);

cs.sendEof(taskManipulation);

78 of 83

Answer

WIP!!!!

Add an answer to a microTask:

cs.postAnswer(a)

Read an answer for a Task:

cs.getAnswer(taskId)

1 of 83

2 of 83

3 of 83

4 of 83

5 of 83

6 of 83

7 of 83

8 of 83

9 of 83

10 of 83

11 of 83

12 of 83

13 of 83

14 of 83

15 of 83

16 of 83

17 of 83

18 of 83

19 of 83

20 of 83

21 of 83

22 of 83

23 of 83

24 of 83

25 of 83

26 of 83

27 of 83

28 of 83

29 of 83

30 of 83

31 of 83

32 of 83

33 of 83

34 of 83

35 of 83

36 of 83

37 of 83

38 of 83

39 of 83

40 of 83

41 of 83

42 of 83

43 of 83

44 of 83

45 of 83

46 of 83

47 of 83

48 of 83

49 of 83

50 of 83

51 of 83

52 of 83

53 of 83

54 of 83

55 of 83

56 of 83

57 of 83

58 of 83

59 of 83

60 of 83

61 of 83

62 of 83

63 of 83

64 of 83

65 of 83

66 of 83

67 of 83

68 of 83

69 of 83

70 of 83

71 of 83

72 of 83

73 of 83

74 of 83

75 of 83

76 of 83

77 of 83

78 of 83

79 of 83

80 of 83