1 of 77

Active Learning

2 of 77

Segmentation

  • Segment images into regions with different semantic categories.
  • These semantic regions label and predict objects at the pixel level

2

Image from http://d2l.ai/

3 of 77

Microstructure Segmentation

  • Structure-property linkages

3

Juwon Na, Se-Jong Kim, Seong-Hoon Kang, Heekyu Kim and Seungchul Lee*, 2022, "A Unified Microstructure Segmentation Approach via Human-In-The-Loop Machine Learning," Acta Materialia

Given input

Fully supervised

Weakly-supervised

4 of 77

Microstructure Segmentation

  • Structure-property linkages

4

Juwon Na, Se-Jong Kim, Seong-Hoon Kang, Heekyu Kim and Seungchul Lee*, 2022, "A Unified Microstructure Segmentation Approach via Human-In-The-Loop Machine Learning," Acta Materialia

Given input

Fully supervised

Weakly-supervised

5 of 77

Weakly Supervised Learning (WSL): Phase Segmentation

5

Segmentation result

Scribble annotation

Juwon Na, Se-Jong Kim, Seong-Hoon Kang, Heekyu Kim and Seungchul Lee*, 2022, "A Unified Microstructure Segmentation Approach via Human-In-The-Loop Machine Learning," Acta Materialia

6 of 77

Weakly Supervised Learning (WSL): Phase Segmentation

6

Segmentation result

Scribble annotation

Juwon Na, Se-Jong Kim, Seong-Hoon Kang, Heekyu Kim and Seungchul Lee*, 2022, "A Unified Microstructure Segmentation Approach via Human-In-The-Loop Machine Learning," Acta Materialia

7 of 77

WSL + Active Learning

7

Iteration 1

Iteration 2

Iteration 3

Annotated Image

Segmentation

Juwon Na, Se-Jong Kim, Seong-Hoon Kang, Heekyu Kim and Seungchul Lee*, 2022, "A Unified Microstructure Segmentation Approach via Human-In-The-Loop Machine Learning," Acta Materialia

8 of 77

WSL + Active Learning

8

Iteration 1

Iteration 2

Iteration 3

Annotated Image

Segmentation

Juwon Na, Se-Jong Kim, Seong-Hoon Kang, Heekyu Kim and Seungchul Lee*, 2022, "A Unified Microstructure Segmentation Approach via Human-In-The-Loop Machine Learning," Acta Materialia

9 of 77

WSL + Active Learning

9

Iteration 1

Iteration 2

Iteration 3

Annotated Image

Segmentation

Juwon Na, Se-Jong Kim, Seong-Hoon Kang, Heekyu Kim and Seungchul Lee*, 2022, "A Unified Microstructure Segmentation Approach via Human-In-The-Loop Machine Learning," Acta Materialia

10 of 77

Active Learning

10

Data set

Test set

Train set 1

Model 1

11 of 77

Active Learning

11

Data set

Test set

Train set 1

Model 1

Model 2

Train set 2

Model 3

Train set 3

Model n

Prediction

Train set n

By Human or Automated

12 of 77

Active Learning: Two Purposes

  1. Active Learning to Build Good Surrogate Model (DOE)
    • Sampling unlabeled data that is expected to most significantly enhance performance of AI models
    • Active learning with Bayesian optimization (BO)
  2. Active Learning to Find Better Solution (Optimization)
    • Identify unlabeled data that is expected to have highest label values

12

13 of 77

1) Active Learning to Build Good AI Model

13

14 of 77

Improving Model Efficiency

14

unlabeled

labeled

15 of 77

Improving Model Efficiency

15

unlabeled

labeled

16 of 77

Improving Model Efficiency

16

unlabeled

labeled

17 of 77

Improving Model Efficiency

17

unlabeled

labeled

18 of 77

Improving Model Efficiency

18

unlabeled

labeled

  • Looks very natural and easy
  • In high dimensional space, it is hard to imagine or visualize
  • Requires mathematical methods

19 of 77

Improving Model Efficiency

19

unlabeled

  • Looks very natural and easy
  • In high dimensional space, it is hard to imagine or visualize
  • Requires mathematical methods

 

20 of 77

Improving Model Efficiency

20

unlabeled

  • Looks very natural and easy
  • In high dimensional space, it is hard to imagine or visualize
  • Requires mathematical methods

 

 

21 of 77

Improving Model Efficiency

  • Selects the most informative data points to minimize labeling efforts while maximizing learning efficiency.

21

unlabeled

  • Looks very natural and easy
  • In high dimensional space, it is hard to imagine or visualize
  • Requires mathematical methods

 

 

22 of 77

Which Unlabeled Data Should We Sample?

22

23 of 77

Which Unlabeled Data Should We Sample?

  • Basic idea

23

24 of 77

Which Unlabeled Data Should We Sample?

  • Basic idea

24

25 of 77

Uncertainty-based Sampling

  • Key idea: Sample the unlabeled data that the AI model is least certain how to label

  • Measures to quantify uncertainty
    • Least confidence
      • Select the data with the lowest maximum probability
    • Margin sampling
      • Select the data that has the smallest difference between the first and second most probable labels

25

AI model’s prediction on two unlabeled dataset

 

Instances

Label A

Label B

Label C

d1

0.9

0.09

0.01

d2

0.2

0.5

0.3

26 of 77

Uncertainty-based Sampling

  • Key idea: Sample the unlabeled data that the AI model is least certain how to label

  • Measures to quantify uncertainty
    • Least confidence
      • Select the data with the lowest maximum probability
    • Margin sampling
      • Select the data that has the smallest difference between the first and second most probable labels

26

 

Instances

Label A

Label B

Label C

d1

0.9

0.09

0.01

d2

0.2

0.5

0.3

AI model’s prediction on two unlabeled dataset

27 of 77

Other Sampling Methods

  • Providing labels for unlabeled data that the AI model is least certain is not the only way to improve AI models’ performance

27

Query-By-Committee (QBC)

Core-Set

Expected Model Change

  • Involve committee of models that are trained on the same training dataset
    • Instead of relying on single AI model, use multiple AI models

  • For classification
    • Select the data that committees most disagree

  • For regression
    • Select the data with maximum variance

Instances

Model 1

Model 2

Model 3

d1

Label A

Label B

Label C

d2

Label A

Label A

Label C

 

28 of 77

Other Sampling Methods

  • Providing labels for unlabeled data that the AI model is least certain is not the only way to improve AI models’ performance

28

Query-By-Committee (QBC)

Core-Set

  • Involve committee of models that are trained on the same training dataset
    • Instead of relying on single AI model, use multiple AI models

  • For classification
    • Select the data that committees most disagree

  • For regression
    • Select the data with maximum variance

Instances

Model 1

Model 2

Model 3

d1

0.81

0.75

0.77

d2

0.52

0.82

0.91

 

Expected Model Change

29 of 77

Other Sampling Methods

  • Providing labels for unlabeled data that the AI model is least certain is not the only way to improve AI models’ performance

29

Query-By-Committee (QBC)

Core-Set

  • Select the data that would impact the most significant change to the current model if we knew its label

  • Significant ‘change’: Estimated as the gradient of the loss with respect to a candidate example

Expected Model Change

30 of 77

Other Sampling Methods

  • Providing labels for unlabeled data that the AI model is least certain is not the only way to improve AI models’ performance

30

Query-By-Committee (QBC)

Core-Set

  • Find ‘Core-Set’ that can cover the entire unlabeled data and query them

  • Core-Set: A set of data that can represent all data points within a distance of 𝛿 in the latent space

Expected Model Change

31 of 77

Active Learning

  • Let’s selectively label only the data that can improve the performance of AI models
    • Minimizing the number of labelling process while…
    • Maximizing the performance of AI model
  • Iteratively sample unlabeled data for labeling and update the training database
    • As iterations are repeated, the performance of the AI model improves

31

AI Model

Sampling

Training

Dataset

Unlabeled

Dataset

Labelers

(Experiments)

32 of 77

Step 1

  • Train AI model with training dataset

32

AI Model

Sampling

Training

Dataset

Unlabeled

Dataset

Labelers

(Experiments)

33 of 77

Step 2

  • Predict labels of unlabeled data with trained AI model

33

AI Model

Sampling

Training

Dataset

Unlabeled

Dataset

Labelers

(Experiments)

34 of 77

Step 3

  • Sample some of unlabeled data that would improve prediction performance of AI model
    • Number of unlabeled data to be sampled is up to the user
    • Various methods to measure which unlabeled data is effective in enhancing model performance exist

34

AI Model

Sampling

Training

Dataset

Unlabeled

Dataset

Labelers

(Experiments)

35 of 77

Step 4

  • Perform experiments or hire experts to label the sampled unlabeled data

35

AI Model

Sampling

Training

Dataset

Unlabeled

Dataset

Labelers

(Experiments)

36 of 77

Step 5

  • Add newly labeled data into the training dataset
    • Repeat step 1 to 5 until AI model reach required performance

36

AI Model

Sampling

Training

Dataset

Unlabeled

Dataset

Labelers

(Experiments)

37 of 77

Let’s Talk about Ensembles

37

38 of 77

Ensemble with Different Models

  • Ensembles: collections of predictors
    • Combine predictions to improve performance

38

Data set

Test set

Train set

Model 1

Result 1

Model 3

Model n

Result 2

Result n

Result 3

Prediction

Model 2

39 of 77

Ensemble with Different Training Sets

  • Bagging
    • parallel

39

Data set

Test set

Train set 1

Model

Result 1

Model

Model

Model

Result 2

Result n

Result 3

Prediction

Train set 2

Train set 3

Train set n

40 of 77

Ensemble with Different Training Sets

  • Boosting
    • sequential

40

Data set

Test set

Train set 1

Model 1

Model 2

Train set 2

Model 3

Train set 3

Model n

Prediction

Train set n

41 of 77

Space vs. Time

41

42 of 77

2) Active Learning for Optimization

42

43 of 77

Objective

  • Similar to traditional active learning in the sense that it involves sampling from unlabeled dataset, but…
    • Objective: Sample unlabeled data that is expected to have highest label values
      • Example) Sampling unlabeled process parameters for maximizing productivity

43

Purpose of Sampling

44 of 77

Which One is Better?

44

45 of 77

Which One is Better?

45

A

B

46 of 77

Which One is Better?

46

A

B

47 of 77

Which One is Better?

47

A

B

48 of 77

Active Learning for Optimization

  • Two important components
    • Surrogate model
      • Return predicted labels and uncertainties of such predictions
    • Utility function
      • Use the results of the surrogate model to measure which unlabeled data is more likely to have higher label values than currently known label values

48

Surrogate Model

Sampling with�Utility Function

Training

Dataset

Unlabeled

Dataset

Labelers

(Experiments)

49 of 77

Surrogate Model

  • “Surrogate” time-consuming and costly labeling processes by predicting the label values for each unlabeled data
    • It returns predicted labels and uncertainties of such predictions
    • Uncertainty values are necessary information for the utility function and, therefore, surrogate model is usually a probabilistic model

49

50 of 77

Surrogate Model

  • “Surrogate” time-consuming and costly labeling processes by predicting the label values for each unlabeled data
    • It returns predicted labels and uncertainties of such predictions
    • Uncertainty values are necessary information for the utility function and, therefore, surrogate model is usually a probabilistic model

50

51 of 77

Surrogate Model

  • “Surrogate” time-consuming and costly labeling processes by predicting the label values for each unlabeled data
    • It returns predicted labels and uncertainties of such predictions
    • Uncertainty values are necessary information for the utility function and, therefore, surrogate model is usually a probabilistic model

51

52 of 77

Surrogate Model

  • “Surrogate” time-consuming and costly labeling processes by predicting the label values for each unlabeled data
    • It returns predicted labels and uncertainties of such predictions
    • Uncertainty values are necessary information for the utility function and, therefore, surrogate model is usually a probabilistic model

52

53 of 77

Surrogate Model

  • “Surrogate” time-consuming and costly labeling processes by predicting the label values for each unlabeled data
    • Widely used probabilistic models: Gaussian process regression (GPR) & Bayesian neural network (BNN)

53

Gaussian Process Regression

  • GPR derives a distribution of functions that can map input to label values based on training data
    • Herein, the forms of these functions are defined by a kernel function

Bayesian Neural Network

Gaussian Process

54 of 77

Surrogate Model

  • “Surrogate” time-consuming and costly labeling processes by predicting the label values for each unlabeled data
    • Widely used probabilistic models: Gaussian process regression (GPR) & Bayesian neural network (BNN)

54

Gaussian Process Regression

  • BNN treat weights of neural network as distributions rather than fixed values
    • This allows the network to express uncertainty in its predictions by considering a range of possible weights

Bayesian Neural Network

55 of 77

Step 1

  • Train the surrogate model with training dataset

55

Surrogate Model

Sampling with�Utility Function

Training

Dataset

Unlabeled

Dataset

Labelers

(Experiments)

56 of 77

Step 2

  • Predict labels of unlabeled data with trained surrogate model
    • Obtain uncertainties of those predictions as well

56

Surrogate Model

Sampling with�Utility Function

Training

Dataset

Unlabeled

Dataset

Labelers

(Experiments)

57 of 77

Step 3

  • Use utility function to sample some of unlabeled data that would possess higher label values than currently known labels
    • Number of unlabeled data to be sampled is up to the user
    • Various utility functions to measure which unlabeled data is more likely to have higher label values than currently known label values exist

57

Surrogate Model

Sampling with�Utility Function

Training

Dataset

Unlabeled

Dataset

Labelers

(Experiments)

58 of 77

Step 4

  • Perform experiments or hire experts to label the sampled unlabeled data

58

Surrogate Model

Sampling with�Utility Function

Training

Dataset

Unlabeled

Dataset

Labelers

(Experiments)

59 of 77

Step 5

  • Check if there are any newly labeled data with label values higher than the previous ones and record those values
  • Add newly labeled data into the training dataset
    • Repeat step 1 to 5 until the highest label value reach predefined condition

59

Surrogate Model

Sampling with�Utility Function

Training

Dataset

Unlabeled

Dataset

Labelers

(Experiments)

60 of 77

How Utility Functions Decide Which Unlabeled Data to Sample?

60

61 of 77

Utility Function: Two Important Ideas

  • Exploitation: Searching the neighborhood of the point with the maximum function value among the points investigated so far in the next step
  • Exploration: Searching the neighborhood of the point with the maximum standard deviation
  • Strategies
    • Focusing solely on exploitation can lead to neglecting data from unexplored areas
    • Conversely, focusing solely on exploration can lead to neglecting data from areas that have demonstrated high label values
    • Using combination of exploration and exploitation as an utility function is a effective strategy to mitigate the trade-off

61

A

B

Agent

Exploitation

Exploration

A

B

62 of 77

Utility Function

  • Utility function strategically selects unlabeled data that is more likely to have higher label values than currently known label values using exploitation, exploration, or a combination of both

62

Probability of Improvement

Expected Improvement

  • Purely exploitative strategy

Upper Confidence Bound

Instances

Prediction

Uncertainty

x1

x2

63 of 77

Utility Function

  • Utility function strategically selects unlabeled data that is more likely to have higher label values than currently known label values using exploitation, exploration, or a combination of both

63

Probability of Improvement

Expected Improvement

 

Upper Confidence Bound

Instances

Prediction

Uncertainty

x1

x2

64 of 77

Utility Function

  • Utility function strategically selects unlabeled data that is more likely to have higher label values than currently known label values using exploitation, exploration, or a combination of both

64

Probability of Improvement

Expected Improvement

  • Balancing mean prediction and uncertainty

  • Favoring uncertainty under the assumption that higher uncertainty hides a potentially higher reward

Upper Confidence Bound

Instances

Prediction

Uncertainty

x1

x2

65 of 77

Utility Function

  • Utility function strategically selects unlabeled data that is more likely to have higher label values than currently known label values using exploitation, exploration, or a combination of both

65

Probability of Improvement

Expected Improvement

 

Upper Confidence Bound

Instances

Prediction

Uncertainty

x1

x2

 

66 of 77

Utility Function

  • Utility function strategically selects unlabeled data that is more likely to have higher label values than currently known label values using exploitation, exploration, or a combination of both

66

  • Fuse exploration strategy into probability of improvement (PI)
    • Weighting PI value by the difference between the current max value and the mean value
    • Probability of obtaining data with larger label value than the existing points is important, but it is also very important how large a value is obtained

Probability of Improvement

Expected Improvement

Upper Confidence Bound

Exploitation

67 of 77

Utility Function

  • Utility function strategically selects unlabeled data that is more likely to have higher label values than currently known label values using exploitation, exploration, or a combination of both

67

  • Fuse exploration strategy into probability of improvement (PI)
    • Weighting PI value by the difference between the current max value and the mean value
    • Probability of obtaining data with larger label value than the existing points is important, but it is also very important how large a value is obtained

Probability of Improvement

Expected Improvement

Upper Confidence Bound

Exploration

68 of 77

Expected Improvement (EI)

68

69 of 77

Expected Improvement (EI)

69

70 of 77

Expected Improvement (EI)

70

71 of 77

Expected Improvement (EI)

71

72 of 77

Expected Improvement (EI)

72

73 of 77

Expected Improvement (EI)

73

74 of 77

Expected Improvement (EI)

74

75 of 77

Expected Improvement (EI)

75

76 of 77

Expected Improvement (EI)

76

77 of 77

Summary

  • Main objectives of active learning with Bayesian optimization
    • Maximize the label value by labelling previously unlabeled data
    • Minimize the number of labelling process

  • Key idea: Sample unlabeled data that is expected to most significantly enhance label values

  • Choose right active learning framework to use

77