1 of 32

Defining Standard Strategies for Quantum Benchmarks

Mirko Amico, Helena Zhang, Petar Jurcevic, Lev S. Bishop, Paul Nation, Andrew Wack, and David C. McKay

IBM Quantum, Yorktown Heights, NY 10598 USA

https://arxiv.org/abs/2303.02108

Journal club presented by Nathan Shammah, nathan@unitary.fund

Quantum Wednesday Journal Club, discord.unitary.fund

April 12, 2023

2 of 32

The role of benchmarks, ideal benchmarks, fair benchmarks: Benchmarks vs. Diagnostics
Review of benchmarks

Quantum volume
CLOPS
Mirror circuits
Application suites

The impact and use of quantum error mitigation and optimization

Outline

3 of 32

The role of benchmarks, ideal benchmarks, fair benchmarks: Benchmarks vs. Diagnostics
Review of benchmarks

Quantum volume
CLOPS
Mirror circuits
Application suites

The impact and use of quantum error mitigation and optimization

Outline

4 of 32

Previous & related work: defining benchmarks

Quantum 6, 707 (2022) https://doi.org/10.22331/q-2022-05-09-707

Phys Rev A 100, 032328 (2019)

https://doi.org/10.1103/PhysRevA.100.032328

5 of 32

Unitary Fund related work & projects

Metriq: Community-driven Quantum Benchmarks

Submissions show performance of methods on platforms against tasks

arXiv (2022), under review

https://arxiv.org/abs/2210.07194

https://github.com/unitaryfund/research/tree/main/qem-on-hardware

https://mitiq.readthedocs.io/en/latest/

https://metriq.info

6 of 32

Quantum Hardware Performance

Quantum Error Mitigation: Generally trading quality for speed (sampling overhead)

Exception: Dynamical decoupling, limited overhead

Operations per second (e.g., CLOPS)
Overall Runtime

Qubit #
Physical qubit overhead (e.g., code distance in QEC)

Average reliability
Dependence on errors

7 of 32

Benchmarks

“Holistic”
Device-agnostic
Randomized-protocol-based

Benchmarks vs. Diagnostics

Diagnostics

Sensitive to specific noise
Hardware-specific
Specific structure

Algorithms (e.g., application-oriented algorithms)

Randomized/aggregated

Algorithmic-based toolkits for benchmarking

8 of 32

Proposed Criteria for Benchmarks

Random: “Tests should have a randomized component (e.g. the circuits, input and/or outputs), and the final metric aggregated over this randomization used to measure an average result.”�
Well-defined: “Benchmarks should have a clear set of rules for defining and running the protocol, so that there is no ambiguity, and others can reproduce the procedure.”�
Holistic: “The benchmarking results should be indicative of performance over a large set of the device attributes in as few metrics as possible, i.e., scale should be implicit in the benchmark.”�
Platform independent: “The protocol should not be tailored to a particular gate-set and should be independent of the types of connectivity and native gates of the platform, so long as it conforms to a universal gate set and the circuit model of quantum computing.”

9 of 32

Metriq project motivation and criteria

Why do we want to benchmark?

Answer the question:

“How does QC Platform X running Software Stack Y perform on Workload Z and how has that changed over time?”

What makes a good benchmark?

Reproducible¹
Scalable²
Application-centric²
Hardware-agnostic²

¹Dasgupta, Samudra, and Travis S. Humble. "Characterizing the stability of nisq devices." 2020 IEEE International Conference on Quantum Computing and Engineering (QCE). IEEE, 2020.�²Martiel, Simon, Thomas Ayral, and Cyril Allouche. "Benchmarking quantum co-processors in an application-centric, hardware-agnostic and scalable way." arXiv preprint arXiv:2102.12973 (2021).

10 of 32

Proposed Criteria for Benchmarks (comparison with Metriq)

Random�
Well-defined�
Holistic�
Platform independent

¹Dasgupta, Samudra, and Travis S. Humble. "Characterizing the stability of nisq devices." 2020 IEEE International Conference on QCE. IEEE, 2020.�²Martiel, Simon, Thomas Ayral, and Cyril Allouche. "Benchmarking quantum co-processors in an application-centric, hardware-agnostic and scalable way." arXiv preprint arXiv:2102.12973 (2021).

Reproducible [1]�
Scalable [1]�
Application-centric [2]�
Hardware-agnostic [1]

https://arxiv.org/abs/2303.02108

11 of 32

More details on diagnostics

Diagnostic: A protocol (circuits and output success measure) that is highly sensitive to certain types of errors, e.g., Hellinger fidelity of GHZ states.� �While benchmarks are designed to compare across technologies and device iterations, diagnostic methods can be used to have a clear characterization of performance in a particular setting. The result of diagnostic methods should be highly predictive for similarly structured problems.

This is the case for most application- inspired methods, which can give a precise indication of expected performance on specific tasks (or similarly structured tasks). However, because of their specificity, diagnostic methods are not good standards. Even when collecting together a suite of diagnostic methods, it is hard to determine whether these will cover all aspects of a device’s performance and so benchmark suites must be carefully constructed.

Also, in the context of getting the maximum performance out of quantum hardware in a specific application, it may be desirable to use compilation and mitigation techniques, thus making diagnostic methods good candidates for including such techniques in their execution.”

12 of 32

The role of benchmarks, ideal benchmarks, fair benchmarks: Benchmarks vs. Diagnostics
Review of benchmarks

Quantum volume
CLOPS
Mirror circuits
Application suites

The impact of quantum error mitigation

Outline

13 of 32

The role of benchmarks, ideal benchmarks, fair benchmarks: Benchmarks vs. Diagnostics
Review of benchmarks

Quantum volume
CLOPS
Mirror circuits
Application suites

The impact of quantum error mitigation

Outline

14 of 32

Quantum volume

15 of 32

Quantum volume

Ref. [2]: Quantum 6, 707 (2022) doi.org/10.22331/q-2022-05-09-707

16 of 32

Quantum volume

https://arxiv.org/abs/2303.02108

Ref. [2]: Quantum 6, 707 (2022) doi.org/10.22331/q-2022-05-09-707

17 of 32

Dos and Dont’s for quantum volume

https://arxiv.org/abs/2203.05489

18 of 32

CLOPS

Circuit Layer Operations per Second (CLOPS)

Informally: Somewhat equivalent to clock-time for standard computers

A. Wack, H. Paik, A. Javadi-Abhari, P. Jurcevic, I. Faro, J. M. Gambetta, and B. R. Johnson, arXiv preprint arXiv:2110.14108 (2021).

19 of 32

Mirror circuits

Mirror circuits are scalable.

https://arxiv.org/abs/2210.07194

https://mitiq.readthedocs.io/en/latest/

20 of 32

Novel proposal: Mirror QV circuits (bypass QV sampling overhead)

Mirror QV circuit: QC circuit with permutation and SU(4) layers that are inverted about the dashed line of symmetry.

Mirror QV success probabilities vs standard QV HOP

Mirror QV circuits seem a good proxy

Linear 4-qubit strings
65-qubit IBM Quantum Ithaca processor
500 circuits of 1000 shots each

Mirror circuit from a QV circuit: Mirror QV circuit

21 of 32

Application suites

github.com/sri-international/qc-app-oriented-benchmarks

https://arxiv.org/abs/2110.03137

https://arxiv.org/abs/2302.02278

22 of 32

The role of benchmarks, ideal benchmarks, fair benchmarks: Benchmarks vs. Diagnostics
Review of benchmarks

Quantum volume
CLOPS
Mirror circuits
Application suites

The impact of quantum error mitigation

Outline

23 of 32

The role of benchmarks, ideal benchmarks, fair benchmarks: Benchmarks vs. Diagnostics
Review of benchmarks

Quantum volume
CLOPS
Mirror circuits
Application suites

The impact of quantum error mitigation

Outline

24 of 32

Proposed rules for optimization (and quantum error mitigation)

Rule 1: Constant time optimizations are allowed and encouraged. Examples:

Dynamical decoupling
Optimal compiling of gates
Efficient measurement mitigation.
Optimal mapping to the device

Rule 2: Mitigation must be reported along with the incurred overhead.

“At one extreme, we could imagine even replacing the run on the device by a fully classical simulator.”

“The results of benchmarking methods that include error mitigation may not be directly comparable to those that do not. Therefore, when comparing results from different devices or different versions of the same device, it’s important to take into account the different error mitigation techniques that have been applied.”

Rule 3: Optimizations based on the output of a circuit are forbidden.

“No optimization of the result or the output based on the knowledge of the what the output of the circuit is expected to be. For example, replacing the circuit with a much simpler version that still obtains the right output and/or post-selecting only the correct outputs.”

25 of 32

Variability of benchmark results over quantum error mitigation

“Example of how the polarization fidelity is affected by dropping low-frequency bit-strings. “

“By sacrificing a moderate amount of shots, we can improve the polarization fidelity up to a perfect value, giving the false impression of a well-performing hardware.”

26 of 32

Variability of QEDC application-oriented benchmarks on optimization and QEM for hardware benchmarks

“Layout selection”: See mapomatic and:

P. D. Nation and M. Treinish, “Suppressing quantum circuit errors due to system variability”, arXiv:2209.15512 (2022).

Executed with default parameters

Executed by adding layout selection

Layout selection and dynamic decoupling

Layout selection, dynamic decoupling and measurement error mitigation

27 of 32

Discussion and conclusions

Application-oriented vs. randomized benchmarks
New benchmark proposed: mirror QV circuit
Relation to work performed at UF

1. Metriq: Automatic pipeline for QED-C benchmarking
2. Reducing overheads in quantum error mitigation
3. Fair comparison: improvement factor and testing on hardware

28 of 32

Community-driven quantum computing benchmarks, metriq.info

result

submission

URL

Web UI

result

github.com/unitaryfund/metriq-client

github.com/unitaryfund/metriq-api

benchmarks

Automatic

API

29 of 32

Reducing the overheads to apply error mitigation

Sampling cost

Probabilistic Error Cancellation (PEC)

Noise characterization

Error Mitigation

Gate Set Tomography (GST)

Pauli-Noise Tomography (PNT)

Overhead Reduction

B. McDonough, et al., Proc.IEEE QCE 2020 arxiv:2210.08611

E. van den Berg, et al., arxiv:2201.09866

Noise scaling factor

NEPEC Technique:

Probabilistic Error Reduction (PER)

Overhead Reduction

Circuit depth

A. Mari, N. Shammah, and W. J. Zeng, Phys. Rev. A 104, 052607 (2021). arXiv:2108.02237

30 of 32

Benchmarking quantum error mitigation on hardware: Task

Vincent Russo, Andrea Mari, Nathan Shammah, Ryan LaRose, William J. Zeng

arXiv:2210.07194

IBMQ Kolkata

2 QEM techniques

ZNE and PEC

2 Benchmarks

RB and Mirror circuits

3 Backends

IBM, Rigetti, IonQ

31 of 32

Benchmarking quantum error mitigation on hardware: Results

Vincent Russo, Andrea Mari, Nathan Shammah, Ryan LaRose, William J. Zeng

arXiv:2210.07194

Improvement factor: Ratio of Sq. root mean errors wo/w mitigation for fair benchmarking

Generalized to different circuits, C, observables, A.
Adjusted for overhead in shots, N.

32 of 32

Conclusions

This work favors randomized-based over application-oriented algorithms for benchmarks
Introduces the concept of diagnostics vs. benchmarks
A new benchmark is proposed to lower the overhead of quantum volume (QV): mirror QV circuits, which seem to be a good proxy
They took the QED-c benchmarks and showed a high variability over optimization and use of quantum error mitigation
This work has strong relations to work performed at UF

Metriq
Reducing overheads in quantum error mitigation
Fair comparison: improvement factor and testing on hardware

For Mitiq: Add mirror_quantum_volume_circuits to mitiq.benchmarks? Already existing:

mitiq.benchmarks.mirror_circuits.generate_mirror_circuits
mitiq.benchmarks.quantum_volume_circuits.generate_quantum_volume_circuits