1 of 83

CrowdSearcher

Framework

Alessandro Bozzon, Andrea Mauri, Chiara Pasini,

Luca Tettamanti, Riccardo Volonterio

http://crowdsearcher.search-computing.org

Milan, October 22nd 2012

Copyright: Politecnico di Milano 2012, All Rights Reserved

2 of 83

Introduction

3 of 83

CrowdSearcher

  • A configurable, programmable crowd-management system
    • Crowds and social network communities as first-class sources

  • Abstractions for
    • Task
    • Evaluation items
    • Crowd-selection strategies
    • Evaluation aggregations
    • ....

4 of 83

Task

A crowdsourced data manipulation/analysis activity, typically focused on a single action (although several concurrent actions are allowed) performed on coherent set of Objects

5 of 83

Task: Examples

  • Recognize and identify the people contained in a set of image
    • Input Objects: images
    • Output Objects: images + bounding boxes + names
  • Annotate the named entities contained in a book
    • Input Objects: text organized in pages
    • Output Objects: set of named entities
  • Crop the silhouette of the models in a set of images
    • Input Objects: images
    • Output Objects: images + polylines
  • Create a complete list of the restaurants nearby Politecnico
    • Input Objects: none
    • Output Objects: set of restaurant names
  • Evaluate the courses offered in Politecnico
    • Input Objects: set of course names
    • Output Objects: course names + vote

6 of 83

Performer

A human being involved in the execution of a Task

Example

  • Students of the Search Computing Course
  • My Facebook friends
  • Expo 2015 Attendees
  • Javascript experts on StackOverflow

7 of 83

MicroTask

An instance of a Task, operating on a subset of its input objects, and assigned to one or more performers for execution.

Example

  • Locate and identify the faces of the people appearing in the following 5 images
  • Order the following courses according to your preferences

8 of 83

Roles

  • Crowd Developer: The developer of a crowd application with CrowdSearcher

  • Task Creator: The creator of a Task
    • Might be human (e.g. the Crowd Developer) or an External Application

  • Performer: The executor of a MicroTask

9 of 83

Use Case: Performer

10 of 83

Use Case: Task Creator

11 of 83

Use Case: Crowd Developer

12 of 83

Architecture

13 of 83

CrowdSearcher Architecture

14 of 83

CrowdSearcher Configurator

  • A user interfaces and a set of Web APIs for
  • Application Creation and Configuration
  • Task Configuration
  • Object provision
  • Task Execution
  • Task Monitoring
  • Performer Subscription

  • Interacts with Social Networks (e.g. Facebook, Twitter) to find and engage performers

  • Interacts with CrowdSourcing platforms to exploit their micro-task execution and marketplace infrastructures

  • Stores information about Tasks, MicroTasks, and their execution

15 of 83

Task Execution Framework

  • Default User Interfaces for MicroTask execution

  • Javascript APIs for custom MicroTask execution interfaces

16 of 83

External Application

  • Custom CrowdSourcing application
    • E.g. Game With a Purpose

  • Uses a set of JAVA APIs that simplify the interaction with the Web APIs of CrowdSourcer

17 of 83

Task Configuration

18 of 83

Computation: the process of mapping an input to an output

CrowdSearcher

Input Objects

Output Objects

Task Configuration

19 of 83

Task Design (1)

  • Which are the input objects of the crowd interaction?
    • Unstructured text, structured data, images

  • Which operations the crowd should perform?
    • Add new instances, verify/modify data, order, etc.

  • How the results of the crowd operations should be aggregated?
    • Sum, Average, Majority voting, etc.

  • Which execution interface the crowd should use for task execution?

20 of 83

Task Design (2)

  • Which splitting strategy do you want for your objects?
    • Number of MicroTasks, Number of objects in each MicroTask, etc.

  • Which performer assignment strategy do you want to apply?
    • Pre-defined, unknown, in a social network, etc.

  • Which external applications should be notified about the task advancements? (REACTIVITY)

  • Are the Task objects defined at Task configuration time, or are they produced in a streamed fashion? (REACTIVITY)

21 of 83

Task Object Design

  • The input Objects are described by a schema
    • Set of fields, each defined by
      • A name
        • Must be a valid SQL identifier!
      • A Type:
        • String
        • Date
        • URL
        • Integer
        • Boolean
        • Image (Url)
        • Video (Url)
        • BLOB

ID

Image

URL

Label

22 of 83

Operations

  • In a Task, performers are required to execute one logical operation on the input objects
    • e.g. Locate the faces of the people appearing in the following 5 images
  • Operations might require operation-dependant configurations

  • CrowdSearcher offers 6 types of pre-defined operations:
    • Like
    • Comment
    • Tag
    • Classify
    • Add
    • Modify
    • Order

23 of 83

Aggregation and Output Objects

  • According to the specified operations, the Task produces output objects having
    • additional fields
    • new objects
    • modified values
  • Values in additional fields and/or modified values might be calculated according to an operation-specific aggregation function

  • The answer provided by each performer is stored in the Task Execution Repository and are accessible via dedicated Web and Java APIs

ID

URL

# Likes

ID

URL

Operation + Aggregation

24 of 83

Operations: Like

  • Ask a performer to express a preference (true/false) for one or more objects
    • e.g. Which pictures/restaurants/football team do you like the most?

  • Example aggregation functions:
    • Number of likes received by each object
    • Majority like: a float value indicating the % of performers that liked the object
    • Custom: a custom value containing the result of an application-specific aggregation

25 of 83

Operations: Comment

  • Ask a performer to write text which describes/summarizes/evaluates an object
    • e.g. Can you summarize the following text? Using your own words, can you describe the following picture?
  • The maximum length of the text can be configured
    • A comment can be used by a client application to contain an application-specific data structure (e.g. a semi-structured document)
  • Example aggregation functions:
    • Concatenation: an array of strings where each element in the array contains one comment
    • Custom: a custom text containing the result of an application-specific aggregation (group of comments by similarity)

26 of 83

Operations: Tag

  • Ask a performer to annotate an object with a set of tags
    • e.g. How would you label the following image?

  • The maximum number of tags can be predefined
    • Tags can have application-defined types
      • Integers to express a "vote" to a given object
      • Structured, low-level features (e.g. coordinates, feature vectors, etc.)
  • Example aggregation functions:
    • The frequency of appearance of each tag for the objects
    • The average vote of the object
    • The approximated bounding box of a face

27 of 83

Operations: Classify

  • Ask a performer to classify an object within a closed-set of alternatives
    • e.g. Which is the category of this restaurant? Would you classify this tweet as politically skewed or not?

  • The classification categories, and the maximum number of selectable categories for an object must be predefined

  • Example of aggregation functions:
    • Number of classifications for each configured category
    • Average number of categories

28 of 83

Operations: Add

  • Ask a performer to add a new object conforming to the specified schema
    • e.g. Can you list the name and address of good restaurants nearby Politecnico di Milano?

  • At the end of the Task, new objects will be provided as output, possibly in addition to the ones given as input

  • Example of aggregation functions for new objects:
    • Distinct values and Frequency

29 of 83

Operations: Modify

  • Ask a performer to verify the content of one or more input objects, possibly modifying/completing wrong or missing information
    • e.g. Which pictures/restaurants/football team do you like the most?

  • Example of aggregation functions for each object:
    • List of suggested updates
    • Majority on applied modifications

30 of 83

Operations: Order

  • Ask a performer to order the provided objects according to some ordering criteria
    • e.g. Order the following football club according to their popularity; order the following books according to your taste
  • Performers can express orders in several ways:
    • Pair-wise orders
    • Partial orders
    • Complete orders

  • Example of aggregation functions:
    • Position of the object in the final ranking
    • Number of accordant orders

31 of 83

Splitting and Assignment Strategies

  • Crowd can provide noisy, biased, possibly spam results
    • Task Planning

  • How to retrieve reliable and correct results?
    • Evaluation redundancy Splitting
      • The “truth” (objective or cultural) exists, and through redundancy we can find it :-)
    • Choose the right crowd! Invitation and Assignment
      • WHTBT: Whoever Happens To Be There
      • Push Vs. Pull

32 of 83

Task Planning in CrowdSearcher

Task Planning

Performer Invitation Strategy

Performer Assignment Strategy

Object Splitting Strategy

[Objects]

[{[Objects],Performers}]

MicroTask Assignment

[{[Objects]}]

MicroTask

NOTE: in the future, custom Task Planning flows will be specifiable

33 of 83

Splitting Strategy

  • Given as input N objects
    • How many MicroTask?
    • Which objects should appear in MicroTasks?
    • How many objects in each MicroTask?
    • How often an object should appear in MicroTasks?
    • Which objects cannot appear together?
    • Should objects be presented always in the same order?

34 of 83

Invitation Strategy

  • Public or private Task?
    • Public Task: everyone can perform
    • Private Task: only registered performers

  • How do you engage people in becoming performers?
    • Publish a post/tweet on your social network profile
    • Publish a post/tweet on your friends' profile
    • Send an email to a mailing list
    • Publish a HIT on Mechanical Turk
    • Create a new challenge in your game
    • ...

35 of 83

Assignment Strategy

  • Given a set of M MicroTasks, which performers are assigned to them?
    • Online assignment
      • Potentially, anyone can execute a MicroTask
        • First come / First Served
        • MicroTask priority
        • MicroTask to Performer matching

    • Offline assignment
      • MicroTasks are uniquely assigned to Performers
        • Performer priority
        • MicroTask to Performer matching

36 of 83

Execution Interface

  • Task Execution Framework
    • Default
      • One default execution interface for each operation type
    • Custom
      • Application-specific execution UI

  • External Application

37 of 83

Constraints and Reactive Behaviour

38 of 83

Evolution of a Crowd Task

  • A Task evolves in time according to the answers provided by the performers

  • A Task Designer might want the system to:
    • Notify a third party application about the updates in the Task Status
    • Adapt the splitting and assignment strategies by re-planning objects or re-assigning performers
    • Define constraints on
      • The task evolution (e.g. maximum time)
      • The performers (e.g. minimum performer quality)
      • The evaluated objects (e.g. quality of the evaluation)
    • Accept new input Objects

39 of 83

Reactive Crowdsourcing

CrowdSearcher

Initial Input Objects

Output Objects

Task Configuration

Input Objects Stream

New Performer

Timer

New Answers

Initialization

RunTime Events

END Objects Stream

40 of 83

Properties of CrowdSearch Objects

Task

  • StartTime / EndTime
  • Status:
    • Initialized: Task created
    • Planned: MicroTask planned and assigned
    • Open: in execution
    • Closed: Task finalized with valid results

  • Confidence : value in [0,1] indicating the reliability of the result
  • #PlannedMicroTasks
  • #InvolvedPerformers
  • ...

41 of 83

Properties of CrowdSearch Objects

Object

  • Answer
    • The final evaluation performed by the crowd
  • Status:
    • Unplanned: inserted but not planned yet
    • Open: in evaluation
    • Closed: evaluated
    • Invalid: evaluation terminated but not completed
  • Confidence : value in [0,1] indicating the reliability of the evaluated answer

  • #AssignedMicroTasks
  • #InvolvedPerformers
  • ...

42 of 83

Properties of CrowdSearch Objects

Performer

  • Status:
    • Available: the performer is ready to be assigned to MicroTasks
    • Unavailable: the performer is not available for assignments
  • Confidence : value in [0,1] indicating the reliability of the performer

  • #AssignedMicroTasks
  • #EvaluatedObjects
  • Assignment-acceptance ratio

43 of 83

Events (1)

  • OnTaskCreation
    • The creation and configuration of a Task. Planning strategies are not yet applied; therefore, no MicroTasks have been created
  • OnTaskStart
    • The event is triggered after the planning strategies have been applied, and the MicroTasks have been created and assigned
  • OnNewObjects
    • Event triggered upon the provision of new input Objects to the Task
  • OnObjectsEOF
    • Event triggered when the EOF command is sent to CS, commanding that no more input objects will be provided

44 of 83

Events (2)

  • Temporal
    • Event triggered according to given temporal rules (e.g. every 5 minutes, at a given time, etc.)

  • OnNewPerfomer
    • The event is triggered when a new Performer registers to the platform

  • OnTaskRequest
    • This event is triggered when a Performer requests the execution of a Task

45 of 83

Events (3)

  • OnMicroTaskStart
    • The event is triggered when a Performer starts the execution of a given MicroTask.

  • OnMicroTaskEnd
    • Event triggered upon the reception of the answer to a MicroTask

  • OnTaskEnd
    • Event triggered when the status of the Task is set to Closed

46 of 83

Control Rules

  • Scripts associated to events
    • Executed upon the triggering of the associated event
    • An event can be associated to several scripts
  • Goal:
    • Inspection of the system status
    • Changes to the system status
      • Plan/Assign newly arrived objects
      • Re-plan/Re-Assign existing objects
      • Aggregate Answers to calculate final results
      • Modify Objects status
      • Modify Performers status
    • Notification to third party applications
      • Task Status (e.g. ended)
      • Object Status (e.g. evaluated)

47 of 83

Examples of Control Rules (1)

  • Maximum duration of Task is 2 hours
    • After 2 hours from the task start trigger a script that
      • Set as closed all the open objects
      • Set the Task as closed
      • Notify Task results

  • Verify that an object has at least 5 valid evaluations, with a majority vote
    • Every time a MicroTask ends, check all the evaluated objects. For each object:
      • Verify how many times it has been evaluated
      • If < 5 evaluations, do nothing
      • If 5 evaluations, count the number of coherent answers
      • Aggregate all the answers and set the final Object answer
      • Set the Object as closed

48 of 83

Web APIs

49 of 83

Introduction

  • The Web APIs allow to interact with the system using HTTP request.

  • The APIs are grouped in:
    • Task manipulation
      • Creation,configuration,opening,monitoring
    • MicroTask manipulation
    • Answers management
    • User management
    • Stream management

50 of 83

Introduction(2)

  • Each API call requires a secretKey that identify your application

  • To obtain a secret key you have to register your application in CrowdSearcher

  • The secretKey must be sent with every API call encoded in the URL
    • i.e: [..]postTask.do?secretKey="your_secret_here"

51 of 83

Task Manipulation

  • /postTask.do
    • Creates a task
    • Input: a JSON containing (example)
      • Title
      • Textual description
      • Schema
      • List of objects
      • Strategies and control rule

  • /getTask.do
    • Retrieves the description of a task (example)
    • Input:
      • The id of the task

52 of 83

Task Manipulation(2)

  • /getMyTasks.do
    • Retrieves the list of tasks belonging to your application
    • Input: none (remember to always include the secretKey)
  • /openTask.do
    • Starts a task by executing the planning strategy defined when creating the task
    • Input: a JSON containing
      • Id: the id of the task
  • /endTask.do
    • Terminates a task
    • Input: the id of the task

53 of 83

Task Manipulation(3)

  • /subscribeToTask.do
    • Subscribes to the feed of a task in order to be notified when some events occur
    • Input: a JSON containing
      • id: the id of the task
      • callback: where the CS will send the answers

54 of 83

MicroTask Manipulation

  • /postMicroTask.do
    • Creates manually a micro task
    • Input: a JSON containing
      • id of the task
      • list of objects
  • /getTaskComposition.do
    • Retrieves the list of micro task that compose a task
    • Input:
      • id of the task
  • /endMicroTask.do
    • Terminates a micro task
    • Input: a JSON containing
      • id: the id of the micro task

55 of 83

Answers Management

  • /postAnswers.do
    • Posts the answers of a given micro task
    • Input:
      • a JSON containing (example)
        • id of the micro task
        • list of answers (it depends on the task type)
      • the id of the microTask execution
  • /getAnswers.do
    • Retrieves the answers of a given task
    • Input: the id of the task

56 of 83

User management

  • /getPerformers.do
    • Retrieves the list of possible performers (social network profiles)
    • Input: none (always remember the secret key)
  • /getUser.do
    • Retrieves the details of a user
    • Input:
      • token of the user

57 of 83

Stream Management

  • /addObjects.do
    • Adds one or more objects to a task
    • Input: a JSON containing
      • id of the task
      • list of objects
  • /sendEof.do
    • Sends the EOF signal to the task, meaning that no more objects will be sent
    • Input: a JSON containing
      • id of the task

58 of 83

Task Execution Framework Javascript APIs

59 of 83

Introduction

  • Used to code the interface of the MicroTask execution

  • It allows the retrieval of (Micro)Task information
    • Gather Task description
    • Retrieve MicroTask input objects
    • Retrieve Task configurations

  • It allows the upload of the MicroTask execution results
    • Post uTask results

  • Interactions with custom Web APIs or third-party applications is allowed

60 of 83

Initialization

  • Include the uTask.js into the html
    • <script src="path/uTask.js"></script>

  • Subclass the uTask "Class"
    • SubClass.prototype = new uTask(); [Javascript]

  • Override the init method
    • init: invoked during initialization.

  • Create the custom implementation.
  • Enjoy!!

61 of 83

Task Details

getDetails( callback )

  • Gets all the the details for the current uTask.

  • Wrapper for the getTask.do CrowdSearcher Web APis

62 of 83

Task input objects

getData( field, callback )

  • Gets the input object for the current MicroTask
    • field specifies which columns should be retrieved

  • Example

getData( function( err, data ) {

// Data will contains all the input objects

})

63 of 83

Task configuration

getConfiguration( name, callback )

  • Retrieves the configurations values
    • name is the configuration key
    • If name is missing return all the configurations

  • Example (classify task)

getConfiguration( 'category', function( err, data ) {

// data contains the list of the categories

} )

64 of 83

Post MicroTask results

  • Execution Logic is delegated to the specific Task Type implementation.

outputData

  • An in-memory object containing the data to send back to crowd-searcher

  • General functions for output management.
    • toggleData(name): toggles name in the outputData
    • removeData(name): removes name from outputData
    • storeData(name, value): store value in outputData
    • postData(callback): post the contents of outputData to the Task endpoint

65 of 83

Java APIs

66 of 83

Introduction

Java API closely matches the Web protocol

  • Not very OO, but the round trips required to set up a task are minimized

  • Standalone JAR
    • Also available as OSGi service
  • JavaDoc is available
    • Please do RTFM
  • Examples provided

67 of 83

Initialization

  • Use the factory to obtain a new API object

  • Configuration:
    • Protocol, host and port of the CS
    • Base URL of the CrowdSearcher
    • Application key

CrowdSearcher cs = CrowdSearcherFactory.newInstance(

"http://localhost:8080", "/cs_prototype_v2",

"s3cr3t");

68 of 83

Task creation

  • Title and question text
  • Public or private?
  • Input schema
  • Input objects
    • Optional: objects can be added after creation
  • Configuration data
    • Dependent on task type

Task task = new Task(TaskType.LIKE, "http://cs.net",

"Title", "question text", false,

"myschema", schema, inputObjects, null);

cs.addTask(task)

69 of 83

Task Configuration (1)

Splitting strategy: defines how many MicroTasks are created and how input objects are splitted among them

  • MANUAL, EQUI_SPLIT, REDUNDANCY, CUSTOM

// Split into buckets with at most 10 input objects

PlanningStrategy strategy =

new EquiSplitPlanningStrategy(taskId,10);

cs.postTaskPlanningStrategy(strategy);

70 of 83

Task Configuration (2)

Assignment strategy: defines "who" does "what"

  • Which MicroTask is assigned to which performer
    • Static (manual, custom)
      • Assignment is decided at task creation time
    • Dynamic (round robin, external, custom)
      • Assignment is performed at run time

PerformerAssignStrategy assignmentStrategy =

new PerformerAssignStrategy(

taskId, AssignmentType.DYNAMIC_ROUNDROBIN);

cs.postPerformerAssignStrategy(assignmentStrategy)

71 of 83

Task Configuration (3)

Invitation strategy: defines how to "invite" performers to execute e task

  • CUSTOM, MANUAL, ANNOUNCEMENT, ALL_FRIEND, RANDOM_FRIEND
  • Invitation platform (Facebook, twitter, g+, etc)
  • Strategy-dependant configuration

final List<SocialNetwork> sn = new ArrayList<SocialNetwork>();

sn.add(SocialNetwork.FACEBOOK);

sn.add(SocialNetwork.TWITTER);

Configuration config = new Configuration(3, null, sn);

final InvitationStrategy invitationStrategy = new

InvitationStrategy(taskId,InvitationName.RANDOM_FRIEND, config);

cs.postInvitationStrategy(invitationStrategy);

72 of 83

Task Configuration (4)

Task Implementation: defines how a MicroTask is executed

  • DEFAULT: built-in interface of CS (deprecated, for testing only)
  • TEF: use the Task Execution Framework
  • CUSTOM: use an external application

TaskImplementation implementation = new

TaskImplementation(taskId,ImplementationType.TEF,

new TaskImplementation.Configuration(

"http://tef.example.net", "tefConfigName"));

cs.postTaskImplementation(implementation);

73 of 83

Evolution of a Task (1)

Control rules: "what" the CS has to do "when" something happens.

  • "when": ON_TASK_END, ON_ADD_TASK, PERIODIC, ON_ADD_OBJ, ON_EOF, etc.
  • "what": name of the Groovy script to execute

ControlStrategy controlStrategy = new

ControlStrategy(taskId, ControlEvent.PERIODIC,

"myControlScript", "@hourly");

cs.postControlRule(controlStrategy);

74 of 83

Evolution of a Task (2)

Emission Strategy: "how" and "when" emit results

  • when: ON_TASK_END, ON_ADD_TASK, PERIODIC, ON_ADD_OBJ, ON_EOF, etc.
  • how: DEFAULT or CUSTOM emission format

final EmissionStrategy es = new EmissionStrategy(

taskId,EmissionStrategyType.PERIODIC,"@hourly",

EmissionFormat.CUSTOM,"myFormatterScript");

cs.postTaskEmissionStrategy(emissionStrategy);

75 of 83

Evolution of a Task (3)

Add a subscription: to receive all the task emissions

String url =

"http://localhost:8085/testerCS/CallbackTest";

Subscriber subscriber = new Subscriber(taskId,url);

cs.subscribeToTask(subscriber);

76 of 83

Evolution of a Task (4)

  • Start a task
    • starts the execution of the Planning strategy to
      • split, assign, and send invitation.

TaskManupulation tm = new

TaskManupulation(taskId);

cs.publishTask(tm);

77 of 83

Evolution of a Task (5)

Stream management:

  • add a new set of objects to the task
  • notify an EOF for the input stream

List<CrowdObject> objects = new ArrayList<CrowdObject>();

//...populate objects...

cs.addObjectsToTask(task.getId(), objects);

final TaskManupulation taskManipulation = new TaskManupulation(taskId);

cs.sendEof(taskManipulation);

78 of 83

Answer

  • WIP!!!!

  • Add an answer to a microTask:

cs.postAnswer(a)

  • Read an answer for a Task:

cs.getAnswer(taskId)

79 of 83

WIP

80 of 83

Simplified Groovy Script APIs

  • Planning
    • Splitting and Assignment

  • Reactivity control

81 of 83

Off-the-shelves Result Aggregations

  • Majority Votes
  • Performer Accuracy
  • Etc.

82 of 83

References

83 of 83

  • CrowdSearcher Code and Documentation
    • http://crowdsearcher.search-computing.org/