1 of 28

MTC: Multiresolution Tensor Completion from Partial and Coarse Observations

KDD 2021

Chaoqi Yang1, Navjot Singh1, Cao Xiao2, Cheng Qian3, Edgar Solomonik1, Jimeng Sun1

1UIUC, 2Amplitude, 3IQVIA

1

2 of 28

Outline

  • Problem Setting
  • Solution
  • Experiment
  • Conclusion

2

3 of 28

Partial and Coarse Data

3

Features

Coarse granular

Fine granular

Mode 1: Location

state code (a)

Zip code (A)

Mode 2: Disease

CCS code (b)

ICD-10 code (B)

Mode 3: Time

Week (c)

Date (C)

A

B

C

A

b

C

a

B

C

?

?

?

?

?

?

?

survey data at specific locations for severe diseases at important dates

different agencies have different data accessibility

from local government

hospital

4 of 28

Fine-granular Tensor Completion

4

Features

Coarse granular

Fine granular

Mode 1: Location

state code (a)

Zip code (A)

Mode 2: Disease

CCS code (b)

ICD-10 code (B)

Mode 3: Time

Week (c)

Date (C)

A

B

C

A

b

C

a

B

C

?

?

?

?

?

?

?

known

Partial observation

Coarse observations

A

B

C

unknown

Low-rank

recover

5 of 28

Outline

  • Problem Setting
  • Solution
  • Experiment
  • Conclusion

5

6 of 28

Background

  • CP decomposition

  • Mode product

6

7 of 28

Solution Sketch

7

Assume the tensor admits a low-rank CP structure

The tensor is related to the observations by operations on factors

Design objective on the observations

Optimize the objective function and obtain the factors

Our multiresolution approach

Recover

8 of 28

Relations to known observations

8

?

?

?

?

?

?

?

Partial observation

Coarse observations

  • The mask is known
  • The aggregation matrix is known, is unknown

A

B

C

A

B

C

A

b

C

a

B

C

B

b

A

a

binary mask

9 of 28

Objective Function

9

Parameters:

10 of 28

Optimize the Loss

  • Need a good Initialization
    • Multi-resolution approach
      • recursively solve low-resolution problem to initialize high-resolution parameters
  • Need a good optimization algorithm
    • Effective ALS-solver (for coupled decomposition)
      • In each sub-iteration, we fix other factors and update one
      • form joint normal equations from multiple linear equations

10

Fewer iterations are needed in high resolution!

11 of 28

Reformulate Objective

  • Step1. Parameter replacement. We use ,
  • Step2. Interim tensor.

11

Transform the loss function

New loss function (for k+1 iteration)

Original loss

12 of 28

Build Solver

  • Alternating least square (ALS)
    • Sub-iteration: optimize U by fixing V, W, Q1, Q2

    • Sub-iteration: optimize V by fixing U, W, Q1, Q2
    • Sub-iteration: optimize W by fixing U, V, Q1, Q2
    • Sub-iteration: optimize Q1 by fixing U, V, W, Q2
    • Sub-iteration: update Q2 by

12

New objective

13 of 28

Overall Framework

13

(1)

(2)

(3)

(4)

(5)

(6)

(7)

14 of 28

Subsampling and Interpolation

14

Continuous mode

Categorical mode

15 of 28

Continuous mode (e.g., date)

  • Subsample

  • Interpolate

15

Intuition: smoothness allows subsampling

16 of 28

Categorical mode (e.g., ICD-10)

  • Subsample

  • Interpolate
    • Copy and fill

16

Intuition: dense part provides better estimation

17 of 28

Outline

  • Problem Setting
  • Solution
  • Experiment
  • Conclusion

17

18 of 28

Datasets

  • IQVIA Covid-19 disease case health tensor (HT)
    • Zipcode by ICD-10 by date
    • ~ half a year
  • Google Covid-19 symptom search (GCSS)
    • Location identifier by keyword by date
    • ~ a complete year 2020

18

19 of 28

Baselines and Metric

  • Reference models
    • CPC-ALS[1], OracleCPD
  • Related baselines
    • Block gradient descent (BGD)[2], B-PREMA[3], CMTF-OPT[4]
  • Parameter initialization methods
    • MRTL[5], TendiB[6], Higher-order singular value decomposition (HOSVD)
  • Three model variants

19

Metric:

where

20 of 28

Main Comparison Results

20

  • MTC outperforms baselines in tensor completion result (i.e., PoF)
  • MTC outperforms baselines significantly in CPU Time

21 of 28

Compared to Reference Models

21

  • OracleCPD: on the original tensor
  • CPC-ALS: only on the partial observation
  • MTC: on both partial and coarse observation
  • OracleCPD is the upper bound of the results
  • Coarse observation helps the completion task. MTC is better than CPC-ALS.

22 of 28

Comparison on Optimization Landscape

22

  • MTC shows faster and better convergence over baselines

23 of 28

Comparison on Initialization

23

  • Our multiresolution initialization technique outperforms other initialization methods

24 of 28

Comparison on Future Tensor Prediction

24

  • MTC outperforms the baselines in tensor prediction task

Gaussian process prediction

Predicted tensor

Procedures:

Results:

25 of 28

Outline

  • Problem Setting
  • Solution
  • Experiment
  • Conclusion

25

26 of 28

Conclusion

  • We formulate the fine-granular tensor completion problem from coarse and partial observations
  • We propose an effective multiresolution ALS optimization framework to solve the coupled decomposition problem
  • Our solution has the same asymptotical space and time complexity as applying standard CPD
  • We show the effectiveness of our solution on two COVID-19 tensors, compared to multiple baselines on various aspects.

26

27 of 28

References

[1] Lars Karlsson, Daniel Kressner, and André Uschmajew. 2016. Parallel Algorithms for Tensor Completion in the CP Format. Parallel Comput.57, C (2016).

[2] Yangyang Xu and Wotao Yin. 2013. A block coordinate descent method for regularized multi convex optimization with applications to nonnegative tensor factorization and completion. SIAM Journal on imaging sciences6, 3 (2013).

[3] Faisal M Almutairi, Charilaos I Kanatsoulis, and Nicholas D Sidiropoulos. 2021. PREMA: principled tensor data recovery from multiple aggregated views. IEEE Journal of Selected Topics in Signal Processing(2021).

[4] Evrim Acar, Tamara G Kolda, and Daniel M Dunlavy. 2011. All-at-once optimization for coupled matrix and tensor factorizations. arXiv:1105.3422(2011).

[5] J. Park, K. Carr, S. Zheng, Y. Yue, and R. Yu. 2020. Multiresolution Tensor Learning for Efficient and Interpretable Spatial Analysis. In ICML. PMLR, 7499–7509

[6] Faisal M Almutairi, Charilaos I Kanatsoulis, and Nicholas D Sidiropoulos. 2020. Tendi: Tensor disaggregation from multiple coarse views. In PAKDD. Springer.

27

28 of 28

MTC: Multiresolution Tensor Completion from Partial and Coarse Observations

Thanks for Listening!

Chaoqi Yang1, Navjot Singh1, Cao Xiao2, Cheng Qian3, Edgar Solomonik1, Jimeng Sun1

1UIUC, 2Amplitude, 3IQVIA

28