1 of 100

Robust data encodings for quantum classifiers

QTML 2019

Ryan LaRose, Brian Coyle

arXiv:work-in-progress

2 of 100

Motivation: Data representation (encoding) is critical

“Science is representation learning by humans. Deep learning is representation learning by machines.”

-- Lex Fridman, Journal of Academic Twitter.

3 of 100

Motivation: Data representation (encoding) is critical

How classical data is encoded in a quantum state is crucial for learning.

There is a tradeoff between robustness and learnability with quantum encodings of classical data.

4 of 100

Outline

  1. Models for quantum classification
  2. Data encoding & learnability
  3. Noise and quantum channels
  4. Robustness results
  5. Conclusions

5 of 100

Outline

  • Models for quantum classification
  • Data encoding & learnability
  • Noise and quantum channels
  • Robustness results
  • Conclusions

6 of 100

Classification problems in machine learning

Input: Feature vectors and labels

Output: “Intelligent” machine that correctly classifies all feature vectors

and can make new (correct) predictions on data it was not trained on.

7 of 100

Quantum classifiers

Input: Feature vectors and labels

Output: “Intelligent” quantum machine that correctly classifies all feature vectors

and can make new (correct) predictions on data it was not trained on.

8 of 100

Common models for quantum classification

9 of 100

Common models for quantum classification

  1. Encode data point in a quantum state
  2. Evolve with a trainable ansatz
  3. Measure a single qubit to get a label {0, 1}

10 of 100

Common models for quantum classification

11 of 100

Common models for quantum classification

  • Encode data point in a quantum state
  • Evolve with a trainable ansatz
  • Measure a single qubit to get a label {0, 1}

12 of 100

Common models for quantum classification

  • Encode data point in a quantum state
  • Evolve with a trainable ansatz
  • Measure a single qubit to get a label {0, 1}

13 of 100

Common models for quantum classification

14 of 100

Common models for quantum classification

  • Encode data point in a quantum state
  • Evolve with a trainable ansatz
  • Measure a single qubit to get a label {0, 1}

15 of 100

Common models for quantum classification

16 of 100

Common models for quantum classification

  • Encode data point in a quantum state
  • Evolve with a trainable ansatz
  • Measure a single qubit to get a label {0, 1}

17 of 100

Common models for quantum classification

18 of 100

Common models for quantum classification

  • Encode data point in a quantum state
  • Evolve with a trainable ansatz
  • Measure a single qubit to get a label {0, 1}

19 of 100

Common models for quantum classification

Theme:

  • Encode data point in a quantum state

  • Evolve with a trainable ansatz

  • Measure a single qubit to get a label {0, 1}

20 of 100

Common models for quantum classification

Theme:

  • Encode data point in a quantum state

  • Evolve with a trainable ansatz

  • Measure a single qubit to get a label {0, 1}

21 of 100

Common models for quantum classification

Theme:

  • Encode data point in a quantum state

  • Evolve with a trainable ansatz

  • Measure a single qubit to get a label {0, 1}

22 of 100

Common models for quantum classification

Theme:

  • Encode data point in a quantum state

  • Evolve with a trainable ansatz

  • Measure a single qubit to get a label {0, 1}

23 of 100

Model for a (binary) quantum classifier

We consider the model commonly discussed in literature:

24 of 100

Model for a (binary) quantum classifier

We consider the model commonly discussed in literature:

25 of 100

Outline

  • Models for quantum classification
  • Data encoding & learnability
  • Noise and quantum channels
  • Robustness results
  • Conclusions

26 of 100

Encoding data in a quantum state

(1) Complete wavefunction encoding

Takes exp(n) time

(2) QRAM

Takes infinite time (not possible)

(3) “Quantum data”

Denies that encoding problem exists

Problem: Given a feature vector encode it in a quantum state on n qubits

27 of 100

Data encodings

Basis encoding for binary data

where each

28 of 100

Data encodings

Amplitude (wavefunction) encoding for arbitrary data

where each

29 of 100

Data encodings

We can consider parameterizations of features.

Schuld and Killoran (Phys. Rev. Lett. 122, 040504) define the tensor product angle encoding

which encodes one feature per qubit.

30 of 100

Data encodings

We can also define a dense angle encoding

which encodes two features per qubit.

31 of 100

Data encodings

Such parameterizations can be generalized to any L2 functions:

32 of 100

Data encodings

Such parameterizations can be generalized to any L2 functions:

These functions directly determine the learnable decision boundaries of the model. In particular, the decision boundary can be found from

33 of 100

Data encodings

Such parameterizations can be generalized to any L2 functions:

These functions directly determine the learnable decision boundaries of the model. In particular, the decision boundary can be found from

For a single qubit classifier, this becomes

34 of 100

Data encodings

Such parameterizations can be generalized to any L2 functions:

These functions directly determine the learnable decision boundaries of the model. In particular, the decision boundary can be found from

For a single qubit classifier, this becomes

35 of 100

Learnability of encodings

Single qubit encoding of two features

Dense angle encoding

Wavefunction encoding

36 of 100

Learnability of encodings

Single qubit encoding of two features

37 of 100

Data encodings

From a hardness/advantage perspective, it's a good idea to encode data with circuits that are hard to simulate classically.

Supervised learning with quantum enhanced feature spaces, Nature. vol. 567, pp. 209-212 (2019)

38 of 100

Key properties of data encodings

  1. For a given encoding, what decision boundaries are possible to learn?

  • For a given encoding, how robust is the classifier to noise?

39 of 100

Robustness definition

Let denote the predicted label by the quantum classifier for data point .

Let denote a noise channel.

We say that the classifier is robust to the noise channel if and only if

40 of 100

Outline

  • Models for quantum classification
  • Data encoding & learnability
  • Noise and quantum channels
  • Robustness results
  • Conclusions

41 of 100

Noise in quantum systems

Noise occurs due to interactions between a principal quantum system and its environment.

Physically,

42 of 100

Noise in quantum systems

Noise occurs due to interactions between a principal quantum system and its environment.

Physically,

We often use the equivalent, more convenient operator-sum representation

where

43 of 100

Common models for noise

Depolarizing noise:

Dephasing noise:

44 of 100

Common models for noise

Pauli noise:

45 of 100

Common models for noise

Pauli noise:

Amplitude damping:

46 of 100

Common models for noise

Pauli noise:

Amplitude damping:

Measurement noise:

where is the probability of getting outcome k given input l.

47 of 100

Outline

  • Models for quantum classification
  • Data encoding & learnability
  • Noise and quantum channels
  • Robustness results
  • Conclusions

48 of 100

Two regimes

  1. Ideal data encoding

This characterizes (mostly) properties of the model.

  • Noisy data encoding

This characterizes (mostly) properties of the data encoding

49 of 100

Robustness to Pauli channels

Result 1: The classifier is robust to Pauli noise

if

Proof:

50 of 100

Robustness to Pauli channels

Result 1: The classifier is robust to Pauli noise

if

Proof:

51 of 100

Robustness to Pauli channels

Result 1: The classifier is robust to Pauli noise

if

Proof:

52 of 100

Robustness to Pauli channels

Result 1: The classifier is robust to Pauli noise

if

Proof:

If , then

53 of 100

Robustness to Pauli channels

Result 1: The classifier is robust to Pauli noise

if

Proof:

If , then

If , then

54 of 100

Robustness to Pauli channels

For the wavefunction encoding

55 of 100

Robustness to Pauli channels

For the dense angle encoding

56 of 100

Robustness to Pauli channels

Corollary 1: Suppose the classification scheme is modified to measure in the X basis, i.e.

Then, the classifier is robust if

57 of 100

Robustness to Pauli channels

Corollary 1: Suppose the classification scheme is modified to measure in the X basis, i.e.

Then, the classifier is robust if

Corollary 2: Suppose the classification scheme is modified to measure in the Y basis, i.e.

Then, the classifier is robust if

58 of 100

Unconditional robustness to dephasing

Result 2: The classifier is unconditionally robust to dephasing noise.

Proof:

59 of 100

Unconditional robustness to dephasing

Result 2: The classifier is unconditionally robust to dephasing noise.

Proof:

  1. Corollary of Pauli channel robustness.
  2. Direct proof

60 of 100

Unconditional robustness to depolarizing noise

Result 3: The classifier is unconditionally robust to global depolarizing noise (at any point in the circuit).

General statement:

61 of 100

Unconditional robustness to depolarizing noise

Result 3: The classifier is unconditionally robust to global depolarizing noise (at any point in the circuit).

Intuition for single qubit case:

If If

then then

62 of 100

Amplitude damping channel

Result 4: A data encoding

is robust to amplitude damping noise iff

for all feature vectors.

63 of 100

Amplitude damping channel

Result 4: A data encoding

is robust to amplitude damping noise iff

for all feature vectors.

Can this be achieved? i.e., do there exist such functions f and g?

64 of 100

Amplitude damping channel

Result 4: A data encoding

is robust to amplitude damping noise iff

for all feature vectors.

Can this be achieved? i.e., do there exist such functions f and g?

Yes.

65 of 100

Amplitude damping channel

Proof of Result 4: For the noisy classifier,

66 of 100

Amplitude damping channel

Proof of Result 4: For the noisy classifier,

Suppose

67 of 100

Amplitude damping channel

Proof of Result 4: For the noisy classifier,

Suppose

Because and , we certainly have

68 of 100

Amplitude damping channel

Proof of Result 4: For the noisy classifier,

Suppose

Because and , we certainly have

That is, noisy classification of features labelled 0 is robust.

69 of 100

Amplitude damping channel

Proof of Result 4: For the noisy classifier,

70 of 100

Amplitude damping channel

Proof of Result 4: For the noisy classifier,

Suppose

71 of 100

Amplitude damping channel

Proof of Result 4: For the noisy classifier,

Suppose

We require

72 of 100

Amplitude damping channel

Proof of Result 4: For the noisy classifier,

Suppose

We require

Using resolution of the identity we arrive at

73 of 100

Amplitude damping channel

Proof of Result 4: For the noisy classifier,

Suppose

We require

Using resolution of the identity we arrive at

74 of 100

Amplitude damping channel

Wavefunction encoding with amplitude damping noise

75 of 100

Amplitude damping channel

Wavefunction encoding with amplitude damping noise

76 of 100

Amplitude damping channel

Wavefunction encoding with amplitude damping noise

77 of 100

Amplitude damping channel

Wavefunction encoding with amplitude damping noise

78 of 100

Amplitude damping channel

Wavefunction encoding with amplitude damping noise

79 of 100

Amplitude damping channel

Wavefunction encoding with amplitude damping noise

80 of 100

Amplitude damping channel

Dense angle encoding with amplitude damping noise

81 of 100

Amplitude damping channel

Dense angle encoding with amplitude damping noise

82 of 100

Amplitude damping channel

Dense angle encoding with amplitude damping noise

83 of 100

Amplitude damping channel

Dense angle encoding with amplitude damping noise

84 of 100

Amplitude damping channel

Dense angle encoding with amplitude damping noise

85 of 100

Amplitude damping channel

Dense angle encoding with amplitude damping noise

86 of 100

Amplitude damping channel

What is a robust encoding?

87 of 100

Can we always achieve robustness?

Yes.

Theorem: For any noisy quantum classifier with trace-preserving quantum operation, there exists an encoding such that the noisy classifications are robust.

88 of 100

Can we always achieve robustness?

Yes.

Theorem: For any noisy quantum classifier with trace-preserving quantum operation, there exists an encoding such that the noisy classifications are robust.

Proof: Schauder’s theorem => trace-preserving quantum channels have at least one fixed point. Fixed points => existence of a robust encoding.

Schauder’s theorem (informal): Any continuous map on a convex, compact subspace of a Hilbert space has a fixed point.

89 of 100

Can we always achieve robustness?

Yes.

But at the cost of learnability.

90 of 100

Can we always achieve robustness?

Yes.

Examples:

Bit-flip channel has fixed points [insert fixed points]

Phase flip channel has fixed points [insert fixed points]

91 of 100

Outline

  • Models for quantum classification
  • Data encoding & learnability
  • Noise and quantum channels
  • Robustness results
  • Conclusions

92 of 100

Conclusions

  1. Encoding classical data in a quantum state is an important, under-studied problem.

93 of 100

Conclusions

  • Encoding classical data in a quantum state is an important, under-studied problem.
  • Many speedups in QML algorithms rely on efficient data encoding, which may not be plausible.

94 of 100

Conclusions

  • Encoding classical data in a quantum state is an important, under-studied problem.
  • Many speedups in QML algorithms rely on efficient data encoding, which may not be plausible.
  • For “practical QML,” data encoding directly determines learnable decision boundaries.

95 of 100

Conclusions

  • Encoding classical data in a quantum state is an important, under-studied problem.
  • Many speedups in QML algorithms rely on efficient data encoding, which may not be plausible.
  • For “practical QML,” data encoding directly determines learnable decision boundaries.
  • The classifier studied here exhibits robustness to a variety of common errors.

96 of 100

Conclusions

  • Encoding classical data in a quantum state is an important, under-studied problem.
  • Many speedups in QML algorithms rely on efficient data encoding, which may not be plausible.
  • For “practical QML,” data encoding directly determines learnable decision boundaries.
  • The classifier studied here exhibits robustness to a variety of common errors.
  • For certain error models, data can be encoded in such a way to ensure robustness while (perhaps) maintaining learnability.

97 of 100

Conclusions

  • Encoding classical data in a quantum state is an important, under-studied problem.
  • Many speedups in QML algorithms rely on efficient data encoding, which may not be plausible.
  • For “practical QML,” data encoding directly determines learnable decision boundaries.
  • The classifier studied here exhibits robustness to a variety of common errors.
  • For certain error models, data can be encoded in such a way to ensure robustness while (perhaps) maintaining learnability.
  • A robust data encoding always exists, though (likely) at the expense of learnability.

98 of 100

Conclusions

Continued work:

  • Generalizing some robustness results to arbitrary qubits,
  • numerical results for noisy data encoding,
  • general robustness results for classes of quantum channels (unital channels, etc.)

Future directions:

  • Different models of classification (distance-based classifiers) and robustness results,
  • universal learning theorems,
  • making the tradeoff between learnability and robustness quantitative

99 of 100

Thank you for your attention.

100 of 100

NOTES SLIDE